Coding tools

Solved: `Jupyter` nightmare on Git

7 min readJul 21, 2022

Perform proper version control on your team’s notebooks

The struggle is real

Let me give you two quick tests to differentiate a data scientist from a software engineer. Give them a simple coding task in Python, something that mostly uses common sense and then

Only give them a jupyter notebook to work with
Ask them to do commit to a git branch and then do a pull request on GitHub

The data scientists will be like a fish in fresh water on the notebook but will surely struggle with git/GitHub while the engineer will be bewildered by the “run-any-cell-anytime” nature of the notebook but fly through git as if you asked them to simply breathe.

Yet both will be throwing the towel if you ask them to review a PR involving notebooks.

And there is a good reason for it. It is hard.

What data scientists discover trying to follow a software engineering workflow is that Jupyter Notebooks are in essence JSON files. These store a plethora of metadata (such as how many times it was opened (not edited, opened), when cells were ran (an ordinal), and plenty more. It also stores outputs in binary form! Any graph you…

Coding tools

Solved: Jupyter nightmare on Git

The struggle is real

Written by Dany Majard

Solved: `Jupyter` nightmare on Git