I’m a big fan of using Jupyter Notebooks for Python projects, but one downside is that version control is a pain. Commits become very large and illegible if you opt to track the entire notebook with the output cells, especially when graphics are included.
There are a number of proposed solutions, but they require either changing your git configuration or generating a Jupyter configuration file and modifying it. These approaches could vary by user and it’s not clear from the code or repo how the changes are being tracked. Below I outline an approach that might be better because the solution is included directly in a notebook cell.
I use a tool called nbstripout (installable via pip) along with a bash command in a notebook cell to solve this problem. Just add a cell like this one to your notebook, set the filenames, save the notebook, then run the cell to create a cleaned version of your notebook without the output.
2017-06-23 12:57:40 Cleaned file created at: <current directory>/notebook_clean.ipynb
Then use git to track the cleaned version and only commit to your working version when you need to push viewable results. This way people can easily see what changes have been made to the clean version and merge changes into the working version more confidently.
You could even use .gitignore to completely untrack the working version if you don’t need to display the output anywhere:
# .gitignore notebook_working.ipynb
Another option is to use Jupyter’s built-in
nbconvert to output your notebook as a Python file, then track changes to that file to make it easier to read commits. This approach is nice because it doesn’t require an external dependency, but the downside is you still need to track the notebook and it’s incremental outputs.
Here’s an example of how this could work:
[NbConvertApp] Converting notebook <current directory>/notebook1.ipynb to script [NbConvertApp] Writing 588 bytes to <current directory>/nbconvert_test/notebook1.py
Then just track the resulting Python file each time you commit the notebook. If you have multiple files that you want to convert at once for version control, you could include a cell like this:
[NbConvertApp] Converting notebook <current directory>/notebook1.ipynb to script [NbConvertApp] Writing 588 bytes to <current directory>/nbconvert_test/notebook1.py [NbConvertApp] Converting notebook <current directory>/notebook2.ipynb to script [NbConvertApp] Writing 588 bytes to <current directory>/nbconvert_test/notebook2.py
Both these approaches makes it a little more clear how the files are being tracked and don’t resort to user specific configuration files that seem prone to error.