Python Toolchain 101
/
Disclaimer
Essays
Slides
pkb-howto
class: middle # Python Toolchain 101 --- class: center, middle This is a lightning talk about the basics of what it means to "install" and "run" a Python program in a UNIX-y environment. If you aren't comfortable with PATH, shebang, Python package discovery, and virtualenv - this may be relevant to you! --- class: left Let's start with the classic python packaging problem. You're working on two different projects, you're not using a virtualenv, and they have the following dependencies: ```python # project-A-requirements.txt requests==2.25.1 # project-B-requirements.txt requests==1.1.0 ``` And you run `pip install -r project-A-requirements.txt` and then later you run `pip install -r project-B-requirements.txt`. What's going to be the problem? -- As you've likely previously experienced, the second install wins, and you will *only* have requests `1.1.0` installed - a problem if your first project really needed that modern version of requests! --- This inability to handle conflicting dependencies for multiple projects is what virtualenvs are supposed to solve. We're going to dig into few specific questions to hopefully illuminate the "magic": -- How do Python's imports work? -- How does your computer locate python? -- How does pip know where this path is? --- class: middle # PYTHONPATH --- class: left The perfect place to start is with the import statement. What's happening here? ```python # myfile.py import requests ``` -- Skipping quite a few special cases for the sake of simplicity, roughly the way this works is that wherever `python` happens to reside... ```bash $ which python /home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/bin/python ``` -- `pip install` will download and save packages relative to it... ```bash $ python >>> import sys >>> print(sys.path) [ ... '/home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/lib/python3.7/site-packages', ] ``` --- class: left And when you go to `import requests`, the interpreter will literally look on your filesystem for a folder *with that name*. This is why only one version of `requests` is allowed to exist at a time: ```bash $ ls /home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/lib/python3.7/site-packages | grep requests requests ``` Indeed, if you look inside, you see the python code: ```bash $ ls /home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/lib/python3.7/site-packages/requests adapters.py auth.py certs.py cookies.py hooks.py models.py __pycache__ status_codes.py utils.py api.py cacert.pem compat.py exceptions.py __init__.py packages sessions.py structures.py ``` Installing a pure python package is very close to "downloading the python source code to a special folder", and importing a package is basically "run the python file relative to a special folder". --- class: middle # PATH --- class: left When you run the following on your command line, your computer locates a python binary and starts executing it. ```bash $ python ``` -- A binary program has been translated to machine code that (hopefully) your CPU can understand and directly execute. If you perform the following: ```bash $ which python /usr/bin/python ``` You'll see that this binary is just a file: you can `mv` or `cp` it, and wherever it is, you can execute it! ```bash $ mkdir -p ~/test $ cp /usr/bin/python ~/test/python $ ~/test/python # Starts a REPL! $ cat ~/test/python # gibberish: ���'��'���H|(H|(���#��#��#,�,H �������0���������@��� ``` --- class: left This raises an interesting question: when you have a binary, you can execute it by specifying the *path* to the binary. But in the previous example, my computer somehow decided that `python` (without any prefix) should correspond to the binary located at `/usr/bin/python`. Why? -- ```bash $ echo $PATH /home/justin/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/justin/.local/bin ``` The secret is the `PATH` variable. Whenever you type in a command that *isn't* a path to a file, your shell will look for a *file* with the exact same name by looking in every single folder in your PATH, in order. -- "Installing" a program can largely consist of three things: 1. Download a binary. 2. Either move it to a folder in your PATH (`/usr/bin`) or add its current folder *to* your PATH. 3. Make the file executable (`chmod +x filename`) --- class: middle # virtualenv --- class: left We're now ready to stitch together the above tricks to talk about virtualenvs! A virtualenv effectively does two things: - When you create one, it *copies* a python interpreter and places it into a dedicated "sandbox". You can think of this as pretty close to `cp /usr/bin/python ~/.virtualenvs/fakeproject/bin/python`. In reality there's more, but this is the essence. - When you "activate" your virtualenv, it *prepends* the path to this sandbox to your PATH variable. ```bash ~/code/fakeproject$ echo $PATH # /home/justin/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ~/code/fakeproject$ pipenv shell (fakeproject) ~/code/fakeproject$ echo $PATH # /home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/bin:/home/justin/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ``` --- class: left In your activated virtualenv, when you now run: ```bash $ python myfile.py # myfile.py # import requests ``` It runs the python interpreter found at `~/.local/share/virtualenvs/fakeproject-74je3CaW/bin/python`, and it looks for imports *relative* to this path: ```bash ~/.local/share/virtualenvs/fakeproject-74je3CaW/lib/python3.7/site-packages ``` Indeed, if you run `pip install requests`, you'll see it appear in this site-packages folder. -- Wait a minute. How does *pip* know where to install this? --- class: middle # shebang --- class: left Earlier we claimed that *binary* files could be executed by specifying their path, but that was an incomplete story. In reality, any UNIX-like operating system has a *program loader* - this is what helps load binary files into memory for the CPU to execute it, but it has another important trick called the shebang. -- ```bash #!/path/to/interpreter ``` This will *invoke the specified interpreter* with the path to your file as its argument. An example: ```python # myfile: #!/usr/bin/python print("THIS IS PYTHON") $ chmod +x myfile $ ./myfile # THIS IS PYTHON ``` You will often see `/bin/sh`, `/bin/bash`, or `/usr/bin/env python` as shebangs. --- class: left So what does this have to do with pip? ```bash (fakeproject) ~/code/fakeproject$ which pip /home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/bin/pip (fakeproject) ~/code/fakeproject$ cat /home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/bin/pip #!/home/justin/.local/share/virtualenvs/fakeproject-74je3CaW/bin/python # -*- coding: utf-8 -*- import re ... ``` -- It turns out that pip is actually *just a python script*. You don't need a `.py` extension, you don't need to call `python pip`. Thanks to the magic of PATH and shebang, `pip` just "looks like" a binary. --- class: left Most importantly, we can observe two things: 1. When you create a virtualenv, it *also* creates a copy of `pip`. 2. This copy of `pip` is *explicitly bound to the virtualenv's copy of python*. Therefore, when you call `pip install requests`, since `pip` is just a python script, and is running using the virtualenv's python, it can invoke: ```python import sys sys.path ``` and find the directory to which it should install packages! --- class: middle # Finis