Ctrl-D Blogurn:uuid:1e44f313-bc4d-3d3e-96bc-daf7bef5a2752022-02-06T00:00:00ZZipf's law of kitchen utensils2022-02-06T00:00:00ZJan Likarurn:uuid:814ceaf2-685e-3c29-9b2c-5b314f1349f6<p>Did you ever notice how as time passes more and more of your cups,
plates and glasses from your kitchen cupboard become the last surviving utensil from a set - a one of a kind?</p>
<p>There is simple mathematical explanation.</p>
<p>Let's say we have 1 brown cup and a set of 4 blue cups.</p>
<p>If no family member prefers either cup color, grabbing the first one he finds, when he
decides to have a warm, tasty cup of tea, it means a blue cup will be used 4x more often than
the brown one.</p>
<p>Because cups only very rarely break on their own volition this means a blue cup is also 4x more likely to break.</p>
<p>Unsurprisingly, this leads to colorful cupboard contents.</p>
How to create a traditional Python package2022-01-09T00:00:00ZJan Likarurn:uuid:6f74c21b-0e48-39ed-bf21-69a4c089a2e1<p>"Traditional" refers to using a plain <code>setup.py</code> file instead of tools like <code>Poetry</code> which use different package configuration file formats.</p>
<p>This blog post is not meant to be a definitive guide. It's only a quick overview of an acceptable Python package configuration.</p>
<h2>Setup.py</h2>
<p>Every installable package needs a <code>setup.py</code> file in its root directory. This file is used to specify package's metadata and dependencies.</p>
<p>The metadata is mostly used if you intend to publish the package to PyPi or other package index.</p>
<p>A simple <code>setup.py</code> file might look like:</p>
<div class="hll"><pre><span></span><span class="ch">#!/usr/bin/env python</span>
<span class="kn">from</span> <span class="nn">setuptools</span> <span class="kn">import</span> <span class="n">find_packages</span><span class="p">,</span> <span class="n">setup</span>
<span class="n">setup</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s1">'MyPackage'</span><span class="p">,</span>
<span class="n">version</span><span class="o">=</span><span class="s1">'1.0'</span><span class="p">,</span>
<span class="n">description</span><span class="o">=</span><span class="s1">'An example Python package.'</span><span class="p">,</span>
<span class="n">author</span><span class="o">=</span><span class="s1">'Jan Likar'</span><span class="p">,</span>
<span class="n">author_email</span><span class="o">=</span><span class="s1">'jan.likar@protonmail.com'</span><span class="p">,</span>
<span class="n">install_requires</span><span class="o">=</span><span class="p">[</span>
<span class="s1">'requests'</span><span class="p">,</span>
<span class="s1">'pandas'</span><span class="p">,</span>
<span class="p">],</span>
<span class="n">packages</span><span class="o">=</span><span class="n">find_packages</span><span class="p">(</span><span class="s2">"src"</span><span class="p">,</span> <span class="n">exclude</span><span class="o">=</span><span class="p">[</span><span class="s2">"tests"</span><span class="p">]),</span>
<span class="n">package_dir</span><span class="o">=</span><span class="p">{</span><span class="s2">""</span><span class="p">:</span> <span class="s2">"src"</span><span class="p">},</span>
<span class="p">)</span>
</pre></div>
<h2>Recommended directory structure</h2>
<p>The directory <code>setup.py</code> lives in is by definition the root directory of the package.</p>
<p>While the source code of the package can be organized in almost any structure imaginable,
it is recommended to put your package's files under <code>src/</code>.</p>
<p>If you need a different layout, change <code>package_dir</code> and <code>packages</code> arguments in <code>setup.py</code>.</p>
<p>Here's an example:</p>
<pre><code>.
├── setup.py
├── src
│ ├── my_package
│ └── __init__.py
└── tests
└── test_my_package.py
</code></pre>
<h2>Installing the package</h2>
<p>Create and activate a virtualenv and run:</p>
<pre><code>pip install -e ".[dev]"
</code></pre>
<p>This will install you package (including development dependencies) in <em>editable mode</em>.
This means any changes to your source code will be reflected automatically, without needing to reinstall the package.</p>
<h2>Dependency management</h2>
<p>Dependencies should be listed in <code>setup.py</code> and should not be pinned to a specific version.</p>
<p>If you need version locking for reproducible deployments, generate a <code>requirements.txt</code> file.</p>
<p><code>pip freeze</code> or, even better, <code>pip-compile</code> (contained in <a href="https://github.com/jazzband/pip-tools">pip-tools</a>) can be used for this purpose:</p>
<pre><code> pip freeze --local > requirements.txt
# Or
pip-compile
</code></pre>
5/5 conference talks2022-01-07T00:00:00ZJan Likarurn:uuid:884c6e8d-e485-3742-91a9-a12bc803004f<p>Here's a list of my all-time favorite conference talks. Go watch them if you're into this kind of stuff.</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=OyfBQmvr2Hc">The Most Beautiful Program Ever Written</a> by William Byrd.</li>
<li><a href="https://www.youtube.com/watch?v=lKXe3HUG2l4">The Mess We're In</a> by Joe Armstrong.</li>
<li><a href="https://www.youtube.com/watch?v=wf-BqAjZb8M">Beyond PEP 8 -- Best practices for beautiful intelligible code</a> by Raymond Hettinger.</li>
<li><a href="https://www.youtube.com/watch?v=ZsHMHukIlJY">Seven Ineffective Coding Habits of Many Programmers</a> by Kevlin Henney.</li>
<li><a href="https://www.youtube.com/watch?v=ecIWPzGEbFc">The Future of Programming</a> by "Uncle" Bob Martin.</li>
</ul>
Understanding traditional dependency management in Python2022-01-06T00:00:00ZJan Likarurn:uuid:de391795-5353-34a4-9702-ef088417c22d<p>The aim of this post is to describe current best practices surrounding traditional dependency management in Python and the elusive difference between how they should be handled in library and application packages.</p>
<h2>Differences between requirements.txt and setup.py</h2>
<p>Both <code>requirements.txt</code> and <code>setup.py</code> can be used to specify dependencies, but their purposes are orthogonal.</p>
<p><code>setup.py</code> is used to specify the versions of dependencies which <em>should</em> work with the package.</p>
<p><code>requirements.txt</code> is used to store a specific combination of dependency versions so they can be installed in a reproducible manner.</p>
<h2>Dependency pinning</h2>
<p>A dependency is pinned if it is specified in a way there's only a single version of it that could be installed.</p>
<p>This is crucial for reproducibility.</p>
<p>For example, running</p>
<pre><code>pip install requests==2.0.0
</code></pre>
<p>would be considered as installing a pinned version of requests.</p>
<h3>Libraries</h3>
<p>When developing a Python library, dependency pinning is generally undesired.</p>
<p>Picture a scenario where your library (lib A) pins its dependency (lib B) to <code>=1.1.1</code>.
Another library (lib C) also needs lib B, and it specifies its version as <code>^1.1.2</code>.</p>
<p>If the developer's code depends both on lib A and lib C, he will be unable to install them together because no version of lib B will
ever match both <code>=1.1.1</code> and <code>^1.1.2</code>.</p>
<p>The problem with this is that lib A would most likely work just fine with version <code>1.1.2</code> of lib B (as long as <a href="https://semver.org">semantic versioning</a> is respected).</p>
<p>The developer of a dependent package can always apply additional constraints if needed, but he cannot loosen them up.</p>
<p>Libraries should, therefore, specify their dependencies in <code>setup.py</code> and as loosely as possible.</p>
<h3>Applications</h3>
<p>On the other hand, when developing deliverable applications, it is often desirable to pin to specific versions of dependencies.
Only so can you guarantee the application will work as designed after it is deployed.</p>
<p>Pinning might make the package harder to install in a global namespace, because version conflicts can arise.</p>
<p>But this can be solved by the developer by packaging the application with vendored dependencies or by the user by installing it in a separate virtualenv.</p>
<p>See <a href="https://github.com/pypa/pipx">pipx</a> for an elegant solution.</p>
<p>The best way (I've found so far) to pin application dependencies is to specify them with loose constraints in <code>setup.py</code> and use <a href="https://github.com/jazzband/pip-tools">pip-tools</a> to generate the <code>requirements.txt</code> lock file.</p>
<h2>Modern times, modern solutions?</h2>
<p><a href="https://python-poetry.org/">Poetry</a> does <em>the right thing</em> by default so it can be used for both libraries and applications.</p>
An approach to transparent, reproducible Python development environments2022-01-05T00:00:00ZJan Likarurn:uuid:bddc7671-98bd-3dbc-a7e7-3796fa263633<p>This post will try to address the reasons for the "works on my machine" fiasco and ways of reliably solving it.</p>
<p>When I was starting out doing serious, team-driven software engineering, managing Python development environments was a big pain point for me. I was new to the language, new to the tooling and new to established best practices.</p>
<p>As years passed, I got better at it. And so did Python. It grew and improved in many ways. Yet clean development environment handling is still somewhat hard to get right. Luckily, we have better tooling now.</p>
<p>So, what are the reasons for Python code working on one machine, but not the other?</p>
<ol>
<li>Incompatible Python dependencies.</li>
<li>Incompatible Python interpreter versions.</li>
<li>Incompatible system library versions.</li>
<li>Incompatible OS.</li>
</ol>
<h3>OS compatibility</h3>
<p>No. 4 is out of scope for this blog post, but for completeness' sake let's mention containerization as a possible solution.</p>
<p><a href="https://runnable.com/docker/python/dockerize-your-python-application">Packaging the app into a container</a> will make the code effectively isolated from
the host OS and <em>should</em> run equally well on a Windows host as on a Linux host.</p>
<p>If done right, it would also fix other 3 sources of problems. It can be appropriate for some use cases - especially if the app is to be deployed to production as a container.</p>
<p>But using it is hardly "transparent". Furthermore, it introduces a new layer of complexity into your development environment, which is often undesirable.</p>
<h3>Partial solutions</h3>
<p><strong>Dependency version inconsistencies</strong> can be remedied by using virtualenvs and pinning dependencies with <a href="https://github.com/jazzband/pip-tools">pip-tools</a>, manually using <code>requirements.txt</code> or using <a href="https://python-poetry.org/">Poetry</a>. This can be enough for many simple use cases.</p>
<p>What was historically somewhat harder to tackle are <strong>Python interpreter version inconsistencies</strong>. Sometimes installing the right version of Python system-wide can help, but it is error-prone and it may break OS utilities that depend on a specific Python version. It is also not appropriate when developing multiple Python packages with different version requirements. Pyenv is a step in the right direction, see <em>Addendum B</em>, but there are better ways.</p>
<p><strong>System library version inconsistencies</strong> are very hard to solve in a maintainable way. Admittedly, they are less common, but there are very few general solutions, so they must usually be dealt with on a case-by-case basis. Installing the shared libraries manually and overriding the system libraries typically works, but nobody wants to deal with that.</p>
<h3>Isolated Python interpreters using Nix and Poetry</h3>
<p>Nix is a package manager (like brew, apt, yum) built for NixOS. But that doesn't mean it can be used only on NixOS. It even works on Windows!</p>
<p>What makes it different from other package managers is the fact it builds all dependencies in isolation from the host OS and uses a clever dependency management approach which enables you to install all required packages - even if they depend on different versions of the same dependency! So-called dependency hell is completely out of the question!</p>
<p>While Nix can also be used to install Python packages, many PyPi packages are not available as Nix packages, so it makes sense to use Nix for system dependencies and Poetry for Python dependencies.</p>
<h4>Nix</h4>
<p>Firstly, <a href="https://nixos.wiki/wiki/Nix_Installation_Guide">install Nix</a>.</p>
<p>Let's say your Python project depends on Python 3.9 with httpx. You also need Terraform and awscli for deploying the app.</p>
<p>Create a shell.nix file in the root directory of your project:</p>
<div class="hll"><pre><span></span><span class="p">{</span> pkgs <span class="o">?</span> <span class="nb">import</span> <span class="l"><nixpkgs></span> <span class="p">{}</span> <span class="p">}:</span>
pkgs<span class="o">.</span>mkShell <span class="p">{</span>
<span class="ss">buildInputs =</span> <span class="p">[</span>
pkgs<span class="o">.</span>terraform
pkgs<span class="o">.</span>awscli
pkgs<span class="o">.</span>python39
pkgs<span class="o">.</span>python39Packages<span class="o">.</span>poetry
<span class="p">];</span>
<span class="p">}</span>
</pre></div>
<p>If you now run nix-shell in the same directory you'll notice all of the specified dependencies are available to you:</p>
<pre><code>> nix-shell
> aws --version
aws-cli/1.19.1 Python/3.9.7 Linux/5.13.0-22-generic botocore/1.20.0
> python --version
Python 3.9.6
</code></pre>
<p>Notice how they report different versions of Python. With Nix they can both coexist at the same time.</p>
<p>What's absolutely amazing about this is that once you close your current shell, the system will behave exactly like before. The new environment will be there for you, but only when you need it.</p>
<h4>Poetry</h4>
<p>Now for Python dependencies:</p>
<pre><code>> nix-shell
> poetry init
This command will guide you through creating your pyproject.toml config.
Package name [env-exp]:
Version [0.1.0]:
Description []:
Author [Jan Likar, n to skip]: n
License []:
Compatible Python versions [^3.9]: ^3.9
Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file
[tool.poetry]
name = "env-exp"
version = "0.1.0"
description = ""
authors = ["Your Name <you@example.com>"]
[tool.poetry.dependencies]
python = "^3.9"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
</code></pre>
<p>You can add the dependencies like this:</p>
<pre><code>> poetry add httpx
> poetry install
</code></pre>
<p>As you can see httpx is now available and ready for use:</p>
<pre><code>> poetry run python
Python 3.9.6 (default, Jun 28 2021, 08:57:49)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpx
>>> httpx.get("http://example.com")
<Response [200 OK]>
</code></pre>
<p>Hardly magic, but it sure feels like it is.</p>
<p><a href="https://niteo.co/blog/project-isolation-beyond-requirements-txt">Project isolation beyont requirements.txt</a> is a similar take on this approach (and inspiration for this post).</p>
<p>See <a href="https://www.tweag.io/blog/2020-08-12-poetry2nix/">Reproducible environments with Nix</a> for packaging your app so it can be installed using Nix.</p>
<h3>Addendum A: Automatically loading the development environment</h3>
<p><a href="https://direnv.net/">Direnv</a> can be used to automatically run nix-shell when you enter your project directory.</p>
<p>Install it and add the following line to <code>.envrc</code> in the root of your project:</p>
<pre><code> use nix
</code></pre>
<p>Now run</p>
<pre><code>direnv allow .
</code></pre>
<h4>Automatic Poetry environment handling</h4>
<p>For bonus points you can also automatically load the virtualenv.</p>
<p>Add to <code>~/.direnvrc</code>:</p>
<div class="hll"><pre><span></span>layout_poetry<span class="o">()</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">[[</span> ! -f pyproject.toml <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
log_error <span class="s1">'No pyproject.toml found. Use `poetry new` or `poetry init` to create one first.'</span>
<span class="nb">exit</span> <span class="m">2</span>
<span class="k">fi</span>
<span class="c1"># create venv if it doesn't exist</span>
poetry run <span class="nb">true</span>
<span class="nb">export</span> <span class="nv">VIRTUAL_ENV</span><span class="o">=</span><span class="k">$(</span>poetry env info --path<span class="k">)</span>
<span class="nb">export</span> <span class="nv">POETRY_ACTIVE</span><span class="o">=</span><span class="m">1</span>
PATH_add <span class="s2">"</span><span class="nv">$VIRTUAL_ENV</span><span class="s2">/bin"</span>
<span class="o">}</span>
</pre></div>
<p>and to <code>.envrc</code>:</p>
<pre><code>layout_poetry
</code></pre>
<p>That's it! From now on whenever you navigate to your project's root directory your shell will behave as if you ran <code>nix-shell; poetry shell</code>.</p>
<h2>Addendum B: Pyenv</h2>
<p><a href="https://github.com/pyenv/pyenv">Pyenv</a> is a Python version manager which enables switching between multiple Python interpreters.</p>
<p>For instance, to use Python 3.9.7 in the current directory, you would run:</p>
<pre><code>pyenv install 3.9.7
pyenv local 3.9.7
</code></pre>
<p>You would then install Poetry by simply running</p>
<pre><code>pip install poetry
</code></pre>
<p>While <a href="https://python-poetry.org/docs/">Poetry install instructions</a> do not recommend installing using pip, in this case it should be perfectly fine, because your're using an isolated Python interpreter. If in doubt, you can still use the official instructions or the excelent <a href="https://github.com/pypa/pipx">pipx</a>.</p>
Why pure functions matter2020-03-04T00:00:00ZJan Likarurn:uuid:8574a3dc-8ca2-3dd2-8f63-cf2ea165bcc1<p>A (mathematically) pure function is a function that always returns the same output if called with the same parameters. Their return values depend on their parameters and their parameters only. Additionally, pure functions have no side-effects.</p>
<p>To put it differently, calling them does not affect the program's environment and their results do not change if their parameters don't change. They don't access the filesystem, they don't send out network packets, nor do they output CLI messages.</p>
<p>Here's an example of a pure function in Python:</p>
<pre><code>def hypotenuse(a, b):
return math.sqrt(a**2 + b**2)
</code></pre>
<p>But don't let this fool you; even much more complex, powerful functions can be pure.</p>
<p>Impure functions, among other things, include functions performing input/output operations, randomness generators, functions spawning threads or forking -- all non-deterministic functions.</p>
<p>While pure functions are mostly considered in the context of functional programming languages, this does not mean they aren't useful in languages that are more "traditional".</p>
<p>The majority of widely-used programming languages do not recognize the concept of a pure function. That doesn't mean pure functions are less useful in such languages. They are still extremely important for ensuring the correctness and maintainability of the codebase.</p>
<p>It is obvious we should strive to make our functions pure, whenever possible.</p>
<blockquote><p>Although practicality beats purity.</p>
</blockquote>
<p>No need to be dogmatic about it, though. Sometimes it just doesn't work.</p>
<h2>Advantages</h2>
<ol>
<li>They are easier to debug and reason about.</li>
<li>Some compilers and interpreters can differentiate pure functions from their counterparts and can apply optimizations to them. For instance, calls to pure functions can get replaced with constants, if the parameters are known at compile time.</li>
<li>They are often more reusable than impure functions.</li>
<li>It's easier to write tests and achieve high levels of test coverage because they reduce the need for mocking/stubbing.</li>
<li>Calls to pure functions that perform expensive computations can be memoized (cached).</li>
<li>Safe concurrency -- if you parallelize a pure function, there will be no data races, because they don't rely on the global state.</li>
</ol>
<h2>Tips for maximizing the benefits of pure functions</h2>
<h3>Localize side-effects</h3>
<p>Haskell and similar languages naturally push us to structure our code in a way that separates side effects from pure computation. In conventional languages, this must be practiced more deliberately.</p>
<p>A good piece of advice, I often find myself returning to, is to keep all operations with side-effects as close to the entry point of the program as possible. You would, for instance, open the config file in the main part of the program (or close to it) and pass its contents to a pure function that would parse it, rather than opening the file somewhere deep in the call stack.</p>
<p>Programs that achieve this are more transparent -- it is more clear in what ways the program interacts with its environment.</p>
<h3>Isolate complexity</h3>
<p>Try to put the majority of complex program logic into pure functions while keeping impure functions as simple as possible.</p>
<p>This will greatly simplify testing, as it will maximize the amount of code you can cover with simple unit tests and reduce the number of required integration tests.</p>
<h3>Test-driven approach</h3>
<p>It is generally accepted test-driven development can be very beneficial to the practice of software development.</p>
<p>While I am not convinced tests should always be written before the actual code, I find it very important to at least think about how the code will be tested before writing it. This tends to drive me towards having more pure functions.</p>
<h3>Make sure functions are pure</h3>
<p>In languages, that don't have a type system that can differentiate between pure and impure functions, some degree of effort is required to ensure the functions are actually pure.</p>
<p>There can be no calls to non-pure functions, the code must not access immutable global variables, the functions should not have any internal state (think generators in Python), etc.</p>
<p>In certain programming languages, it can be hard to be 100% sure about the purity of a function. Some languages even perform impure operations, such as heap allocations, behind your back.</p>
<p>It is, however, still better to have an almost-pure function than a blatantly impure one.</p>