I’m a big fan of using Jupyter Notebooks for Python projects, but one downside is that version control is a pain. Commits become very large and illegible if you opt to track the entire notebook with the output cells, especially when graphics are included.
There are a number of proposed solutions, but they require either changing your git configuration or generating a Jupyter configuration file and modifying it. These approaches could vary by user and it’s not clear from the code or repo how the changes are being tracked. Below I outline an approach that might be better because the solution is included directly in a notebook cell.
I use a tool called nbstripout (installable via pip) along with a bash command in a notebook cell to solve this problem. Just add a cell like this one to your notebook, set the filenames, save the notebook, then run the cell to create a cleaned version of your notebook without the output.
2017-06-23 12:57:40 Cleaned file created at: <current directory>/notebook_clean.ipynb
Then use git to track the cleaned version and only commit to your working version when you need to push viewable results. This way people can easily see what changes have been made to the clean version and merge changes into the working version more confidently.
You could even use .gitignore to completely untrack the working version if you don’t need to display the output anywhere:
# .gitignore notebook_working.ipynb
This approach makes it clear how the files are being tracked and doesn’t resort to user specific configuration files that seem prone to error.