Coding tools
Solved: Jupyter
nightmare on Git
Perform proper version control on your team’s notebooks
The struggle is real
Let me give you two quick tests to differentiate a data scientist from a software engineer. Give them a simple coding task in Python, something that mostly uses common sense and then
- Only give them a jupyter notebook to work with
- Ask them to do commit to a git branch and then do a pull request on GitHub
The data scientists will be like a fish in fresh water on the notebook but will surely struggle with git/GitHub while the engineer will be bewildered by the “run-any-cell-anytime” nature of the notebook but fly through git as if you asked them to simply breathe.
Yet both will be throwing the towel if you ask them to review a PR involving notebooks.
And there is a good reason for it. It is hard.
What data scientists discover trying to follow a software engineering workflow is that Jupyter Notebooks are in essence JSON files. These store a plethora of metadata (such as how many times it was opened (not edited, opened), when cells were ran (an ordinal), and plenty more. It also stores outputs in binary form! Any graph you…