How to Setup Neat Python Environments for Software Development and Data Science

How to Setup Neat Python Environments for Software Development and Data Science

  • Post author:
  • Post category:Tooling
  • Reading time:6 mins read

Introduction

Probably every python developer or data scientist has had a situation like a “ModuleNotFoundError” or a jupyter notebook already fails to execute the first cell on the computer of a colleague. Chances are high that those errors are caused by different python or package versions. There are several solutions for such problems. For example you may provide a requirements.txt with all dependencies to execute your project. But how can you ensure that other projects you are currently working on are not affected by just installing the respective packages?

In this post we’ll discuss how you can setup your development machine as a data scientist, data engineer or software engineer working with python. Our goal is to have clean and separate environments for each project which will work for you as well as for other developers out of the box. To achieve this we focus on two powerful tools for this purpose, pyenv and poetry. Pyenv manages different python installations on your machine which makes it super easy to work in different projects with different python version requirements. Poetry manages packages and dependencies for your project in dedicated files which can be shared in version control systems so other developers are able to setup the same environment for the project on their own machine.

Manage Python Versions

A comprehensive pyenv installation guide for all major operating systems is provided in the github repository of the pyenv project.

If you have pyenv installed let’s get started by listing all available python versions that can be downloaded. In your terminal type:

pyenv install --list

To check which versions you have already on your machine simply type:

pyenv versions

Output:
system
* 3.7.3 (set by /Users/user/.pyenv/version)
3.8.2

If you need python 3.6.1 for a specific project or general testing purposes, you can simply type:

pyenv install 3.6.1

To set 3.6.1 as your default version globally on your system you can do so by typing:

pyenv global 3.6.1

Most of the time however you probably want to specify a certain version for a specific project. Navigate to the project folder and specify 3.8.2 as the local version which is valid within this folder:

cd /path/to/project
pyenv local 3.8.2

Now check the version you are currently using:

pyenv version

Within your project you should see 3.8.2, while outside of this folder the output should be 3.6.1 (your specified global version). The local version is stored in a hidden file named .python-version in your project folder. If you are using a version control system like git you should consider to add .python-version in your .gitignore file and rather document the required python version or range of compatible versions in the readme documentation of your project.

Manage packages and dependencies

Using just “pip install” on a global python version is usually not a good idea if your projects grow or other team members work on the same code base. Poetry is a perfect tool to capsulate dependencies for each separate project. See how to install poetry on their official website.
Let’s start with an empty folder which will be your root directory for your next project. The first thing to do is to specify the python version with “pyenv local” as described above. With poetry installed you can now create a new python application folder inside your project with a fresh virtual python environment to start with. Navigate to the project folder in your terminal and type:

poetry new demoapp

Navigate to the new demoapp and specify the first packages you want to install:

cd demoapp
poetry add pandas=1.0.0
poetry add --dev flake8=3.8.4

This will first create the virtual environment, installs pandas with all necessary dependencies and also adds flake8 which can be used for style checking of your code during development. Make sure to provide the exact version of a package or at least a range of versions. The commands will work even without any constraints but may take a very long time to resolve all dependencies. The folder structure of your project should now look like this:

project-root-folder
|___demoapp
    |____demoapp
         |____ __init__.py
    |____ poetry.lock
    |____ pyproject.toml
    |____ README.rst
    |____ Tests
          |____ __init__.py
          |____ test_demoapp.py

A closer look at the pyproject.toml reveals a structured toml file which lists all your specified packages and development-only packages (they will not be installed in a production environment) for this application. The poetry.lock file lists all the packages which are installed in the environment and shows all the resolved dependencies of your specified packages. Both files should be included in your version control so someone who wants to work on the code base can simply clone the repository and install the exact same packages by just typing:

poetry install

You can adjust the install command to exclude the development-only packages like this:

poetry install --no-dev

Now you can run your python scripts inside the virtual environment by typing

poetry run python example.py

Remarks

VSCode

If you use a code editor like VSCode you can specify that poetry installs the virtual environment inside your project folder which will be automatically detected when opening the editor. Simply type the following command before generating a new environment:

poetry config virtualenvs.in-project true

It may take a litte time for VSCode to detect a fresh new .venv folder created by poetry in your project root folder. You may need to reopen the project folder, go to the command palette (Command+Shift+P or Ctrl+Shift+P), search for “Python: Select Interpreter” and choose the .venv folder from the provided list. Afterwards, VSCode should open your project with the right environment directly.

Jupyter Notebooks

If you are using jupyter notebooks for data science, the poetry installation does not work out of the box since you need an ipykernel and of course jupyter installed. Therefore, navigate to your application folder and add jupyter and ipykernel by typing:

poetry add jupyter ipykernel

You are now able to register the kernel for this environment by specifying a name which should be the same as your application name:

poetry run ipython kernel install --user --name=demoapp

Now you can run

poetry run jupyter notebook

if you have no other jupyter installation available or you just use any other jupyter installation on your system and select the newly registered kernel as a starting point when creating a new notebook.

Known Issues

If you are on MacOS 11.x have a look at this Issue on GitHub.

Leave a Reply