OverviewTeaching: 30 min Exercises: 10 minQuestions
What is Conda?
Why should I use it?Objectives
explore the benefits of python environments
discuss how conda can allow you to make the “perfect python environment”
We will start this tutorial by looking at a picture of the perfect python environment.
Definition of python virtual environment: a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.
Python Environment Discussion
What are the benefits of having a well defined python environment?
- Avoid future breakage if any dependencies changes
- Allows better collaboration among team
pipas a python package manager, along with
virtualenvas the python environment manager.
pipenvas a python package and environment manager.
Lightweight distribution of conda; only contains the necessary python packages.
A data science platform distribution of conda; comes with a lot of scientific python packages.
Bob is a post-doc. He has been programming in Python for a few years now,
and he is very comfortable managing his own python environment,
virtualenv, but now he’s with the “cool” kids using
Recently, his studies are shifting more towards a geospatial focus, and he will need python libraries such as gdal, fiona, and netcdf. Let’s see what happens.
# First bob installs the new pipenv pip install pipenv # Next he creates a folder for his new geoproject mkdir geoproject # Next bob creates a requirements.txt using vim text editor so he can share this later vi requirements.txt # In requirements.txt # ipython # requests # fiona # gdal # netCDF4 # Now he installs those packages pipenv install -r requirements.txt # Bob got an error ... get_dependencies legacy_results = self.get_legacy_dependencies(ireq) File "/usr/local/lib/python3.6/site-packages/pipenv/patched/piptools/repositories/pypi.py", line 335, in get_legacy_dependencies self.resolver.resolve(reqset) File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/resolve.py", line 107, in resolve self._resolve_one(requirement_set, req) File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/resolve.py", line 264, in _resolve_one abstract_dist = self._get_abstract_dist_for(req_to_install) File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/resolve.py", line 214, in _get_abstract_dist_for self.require_hashes File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/operations/prepare.py", line 328, in prepare_linked_requirement abstract_dist.prep_for_dist(finder, self.build_isolation) File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/operations/prepare.py", line 155, in prep_for_dist self.req.run_egg_info() File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/req/req_install.py", line 486, in run_egg_info command_desc='python setup.py egg_info') File "/usr/local/lib/python3.6/site-packages/pipenv/patched/notpip/_internal/utils/misc.py", line 698, in call_subprocess % (command_desc, proc.returncode, cwd)) pipenv.patched.notpip._internal.exceptions.InstallationError: Command "python setup.py egg_info" failed with error code 1 in /tmp/tmp_5d7wspdbuild/GDAL/ # Bob looked at https://pypi.org/project/GDAL/, but it's still really confusing to set this up... HELP!
In the other side of the world, we meet Sandy. She is an advanced undergrad that has attended one of the hackweek at UW eScience. She just started to really program in Python. Her senior thesis project requires her to analyze a geospatial entity. Similar to bob she knows that she will need to use gdal, fiona, and netcdf. Having learned about
conda in the hackweek she started following the
conda workflow in creating a new project. Let’s see what happens.
# Sandy has installed conda into her linux machine, so her first step now is to make a new directory for the project mkdir geoproject # Next Sandy creates an environment.yml using vim text editor so she can share this later vi environment.yml # In environment.yml # name: geoproj # channels: # - conda-forge # dependencies: # - python=3.6 # - ipython # - requests # - fiona # - gdal # - netCDF4 # Now she installs those packages conda env create -f environment.yml # Sandy suceeded in the install after a few minutes Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate geoproj # # To deactivate an active environment, use # # $ conda deactivate # Sandy activated her new geoproj python environment, and check whether gdal works. conda activate geoproj gdalinfo --version GDAL 2.2.4, released 2018/03/19 ogr2ogr --version GDAL 2.2.4, released 2018/03/19
pipenvis basically just a nice wrapper that uses
virtualenvunder the hood
pipis simply just a python package manager
pipdoes not handle library dependencies outside of the python packages as well as the python packages themselves
pipwheels can solve some of the lower level dependencies problems that we run into in bob’s case, but GDAL Developers did not include these dependencies within the wheels, users have to set it up themselves!
NOTE: Conda can manage pip packages, but pip cannot manage conda packages
Before start let’s ensure that you have conda installed either via Miniconda or Anaconda
# Check conda version to make sure it's installed. conda info
Conda Help and Manual
To see the full documentation for any command, type the command followed by
--help. For example, to learn about the conda update command:
$ conda update --help
What is a conda environment?
# List out available environments conda env list # The starred * environment is the current activate environment # Create conda environment from command line (Not Best Practice) conda create --name myenv --channel conda-forge python=3.6 # Activate conda environment conda activate myenv # Deactivate conda environment conda deactivate # Create conda environment from environment file (Recommended Best Practice) conda env create --file environment.yml # Removing conda environments conda env remove --yes --name myenv
Best practice to share environments
- When starting a new environment, always generate it from an environment file rather than the command line.
- As you add packages to the environment, be sure to update the environment file.
- Unless you have to (i.e. Production Environments), try to avoid specifying the version of each package. This will ensure you have the most up to date version that will work across platform.
If you follow these guidelines, you should be able to give your environment file to anyone, and they will be able to install your packages with no problem.
What is a conda channel?
# List out your channels and priorities conda config --get channels # If you have a few trusted channels that you prefer to use, you can pre-configure these so that everytime you are creating an environment, you won’t need to explicitly declare the channel. conda config --add channels conda-forge
NOTE: The highest priority channel is where your packages will be installed from no matter if another channel has a higer version!
Conda Forge (https://anaconda.org/conda-forge)
Conda forge is a community led collection of recipes, build infrastructure and distributions for the conda package manager.
Watch Filipe’s talk from pycon, one of the conda-forge lead developer, https://www.youtube.com/watch?v=qJFkIuzD6tI for more info about how to put your packages into the conda-forge channel!
What is a conda package?
You can search for conda packages at https://anaconda.org/ or the terminal shown below.
# Look at the packages you have installed conda list # Let's search for gdal conda conda search gdal # Install a single conda package conda install -c conda-forge gdal # Or install multiple packages conda install -c conda-forge gdal fiona # Removing a conda package conda remove -n myenv gdal
Instruction on how to compile the conda package and its metadata
package: name: pandas version: source: url: https://github.com/pydata/pandas/archive/v.tar.gz sha256: d9f67bb17f334ad395e01b2339c3756f3e0d0240cb94c094ef711bbfc5c56c80 build: number: 0 script: python setup.py install --single-version-externally-managed --record=record.txt about: home: http://pandas.pydata.org license: BSD 3-clause summary: 'High-performance, easy-to-use data structures and data analysis tools.' extra: recipe-maintainers: - jreback - jorisvandenbossche - TomAugspurger
conda can be installed in two ways (Anaconda and Miniconda)
this tool will prevent headaches when trying to install packages with dependancies or managing multiple libraries/projects
projects can be separated by individual environments
Reproducible environments can be created easily
Most conda packages are friendly across all platforms
If you’re not convinced about using conda, read this great blog