How to Build a Python Library

How to Build a Python Library - HedgeDoc

<center> How to Build a Python Library === <big> **Share Developments and Contribute to the Community** </big> *Written by [Angel](https://twitter.com/afreydev). Originally published 2021-01-19 on the [Monadical blog](https://monadical.com/blog.html).* </center> --- Before starting at Monadical, my Sysadmin and DevOps work always included a Java and MySQL/Oracle stack. I’d used Python for some web development and web DevOps stuff before, but I’d never really had a chance to dive deeper with it. This year, I’ve been working with a professional team of Python developers at Monadical, and the learning process has been awesome! As I’ve spent more time working with Python, I’ve realized that one of the biggest advantages it has over other programming languages is the depth and scope of its libraries. For example, there are Python libraries for: * **Data analytics and data science:** NumPy, Pandas and Scikit-Learn * **Web development:** Django and Flask * **Building desktop applications:** tkinter and pyqt As developers (or DevOps), we use libraries in our projects all the time. But the deeper I got into Python, the more I was struck by how we rarely think about the other side of things: how are libraries created? Knowing how to build libraries gives you the ability to share developments with the community, give people tools for speeding up their projects, and add cool new features to the Python ecosystem. <center> ![shell](https://docs.monadical.com/uploads/upload_1919bd97e1a8114e0a2e932928e68042.png =x250) </center> ## Overview of the library creation process Python libraries are a collection of modules and packages. Packages are a collection of modules that work towards a common goal. Creating a new Python package involves several processes: * **A development process:** Basically this step is the creation of the software as such. The development process for a Python project is not so different from other programming languages. Usually for codebase management, the team (or developer) uses a source code repository with a tool like git hosted on GitHub, GitLab or Bitbucket. To give you an example of this, I created a toy Python Printer project containing a function that shows a message in the console. The project--available here: https://github.com/afreydev/python-printer includes the code, a functional test, and a GitHub Actions workflow. * **Rigorous functional and integration testing:** A test-driven development has lots of advantages: you can catch bugs early, check the stability of different versions quickly, reduce manual testing (and human error), and launch versions quickly. Depending on the intended uses of the package, it might need testing to check that the functions it includes are working properly. If the package needs to work with external systems, these tests might simulate those integrations using light technologies like embedded web servers, or Docker containers. Tests can be configured in the CI workflow to check the functioning of software each time anybody updates the codebase. [Pytest](https://docs.pytest.org/en/stable/) is a good library for testing. If you want to get a better idea of concepts related to testing, have a look at this [reference](https://realpython.com/pytest-python-testing/). * **A CI workflow to ensure that the package runs on various operating systems:** For Continuous Integration workflows, you can test your software across different environments (operating systems, versions of Python, etc). Testing your package in lots of different possible scenarios prevents bugs, and can help to reduce technical debt caused by smells or bad practices in your code. For our example project, Python Printer, I’ve added a workflow that triggers the different tests. You can see that it checks for syntax or style errors (in the lint job), and that the test runs for different operating systems and Python versions. <center> ![ci process](https://docs.monadical.com/uploads/upload_4c78bc169e9aac439b6f10a17f42112d.png =x300) </center> This next example can be used as a template for your Python projects. It includes some jobs for linting your code with [flake8](https://flake8.pycqa.org/en/latest/) and testing it using [pytest](https://docs.pytest.org/en/stable/). ``` name: test on: [push] env: MAX_LINE_LENGTH: 110 jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python }} uses: actions/setup-python@v1 with: python-version: 3.6 architecture: x64 - name: Lint with flake8 run: | pip install flake8 flake8 . --count --select=E9,F63,F7,F82 --exclude=setup.py --show-source --statistics flake8 . --count --exclude=setup.py --max-line-length=$MAX_LINE_LENGTH --statistics test: runs-on: ${{ matrix.os }} strategy: matrix: os: [ubuntu-latest, macos-latest] python: [3.6, 3.7, 3.8] steps: - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python }} uses: actions/setup-python@v1 with: python-version: ${{ matrix.python }} architecture: x64 - name: Build sdist and bdist_wheel run: | pip install setuptools pip wheel python3 setup.py sdist bdist_wheel - name: Test package with pytest run: | pip install pytest python setup.py install pytest ``` * **A building distribution and publishing process:** The building component of this process basically involves configuring your package (creating a `setup.py` file) and running some building commands. The publishing component consists in uploading your distribution to a repo like [Pypi](https://pypi.org/). The rest of this article goes into more detail about this step. ## Configuring your package `setup.py` is an important element related to the installation process for a Python package. This file describes the distribution that we want to create using our code. It includes other distributions that you use in your software like dependencies and it can be complex or simple depending on the kind of software you have created. Until quite recently, the setup was done using the [distutils](https://docs.python.org/3/library/distutils.html) library (the standard tool), but this is gradually being replaced by [setuptools](https://setuptools.readthedocs.io/en/latest/). A minimalistic `setup.py` can include only the package field: ```python import setuptools setuptools.setup( packages=setuptools.find_packages(), ) ``` This minimal approach works, but a lot of metadata (including the “required” name, version and packages fields) is lost, and finding the package in a repo like Pypi without this metadata is virtually impossible.[^1] Here’s another example that includes more information: ```python import pathlib from setuptools import setup BASE_PATH = pathlib.Path(__file__).parent README = (BASE_PATH / "README.md").read_text() setup( name="python-printer", version="1.0.0", description="Print awesome messages in your screen", long_description=README, long_description_content_type="text/markdown", url="https://github.com/afreydev/python-printer", author="Your Name", author_email="your-email@gmail.com", license="MIT", packages=["python_printer"], include_package_data=True, ) ``` This kind of file is what helps to install Python distributions directly from the code: ```bash python setup.py install ``` <center> ![pypi logo](https://docs.monadical.com/uploads/upload_e0d7aeed292ee9c6d9a090225f9bb28b.png =x300) </center> ## Building distributions There are two different possible methods for building distributions: creating a source distribution and creating a wheel. Ultimately, the difference between the two processes comes down to where the code is compiled, or built. This might not seem like a big deal, but it introduces some important differences: * **sdist:** This is an easy way to package the source distribution of your software. The building process occurs on the user’s side and requires users to have correctly configured their machines by installing the proper Python libraries and operating system software. Usually, this kind of package is bigger in size than a wheel. This is because the source distribution includes all the source code and dependencies (eventually, C or C++ extensions) that it needs to be built (if you want to use this kind of package, you need to build it first). In this case, from a developer’s perspective, creating a source distribution is easy, but it will make more work for the consumer. To create a source package, you just need to run the following command: ```bash python setup.py sdist ``` This command creates a folder called ‘dist’ with a file inside it. In Windows, the generated file will be a zip file, while in Mac OS and Linux, it will be a tarball. This kind of distribution does have some disadvantages. Python’s sdist packages depend on distutils and setuptools to be built and installed in an environment running code like `setup.py`. This method makes the process slow and it’s an additional thing to maintain. * **wheels:** This is another method for building and packaging your software. With this method, the building process happens on the machine of the producer of the package. There are a few different kinds of wheel: * Universal wheels: These wheels work for Python 2 or 3. They can be installed everywhere. * Pure-Python wheels: These wheels support either Python 2 or Python 3 (but not both). * Platform wheels: These wheels support specific platforms and Python versions (including different operating systems and architecture). There are some [Docker images](https://quay.io/organization/pypa) that help to build platform wheels. These images package the libraries so that they can run on most Linux platforms. The Python software foundation is focused on improving this method for future use. <center> ![wheel process](https://docs.monadical.com/uploads/upload_159498e5982b3c52e01ccbc04780b5be.png =x200) </center> With wheels, installation is faster than with a source distribution installation because wheels don’t require a build process and don’t depend on tools like distutils and setuptools. In this case, we don’t need compilers for the C extensions--everything is in the box. Finally, the continuous integration step is faster because of the more efficient and lightweight use of cache systems. A simple way to create a wheel is to run the following command: ```bash python setup.py bdist_wheel ``` You can create the source distribution and wheel at the same time too: ```bash python setup.py sdist bdist_wheel ``` Though you can get away with skipping this step, it’s a good idea to ensure that your wheels can run on different operating systems. There are tools that help with this, for example, the [manylinux](https://github.com/pypa/manylinux) project. Manylinux facilitates the distribution of binary Python extensions as wheels on Linux. The end result is a package that can run on most Linux distros without any problem. To use manylinux, you have to use the official Docker images (based on CentOS Linux distro)--these come with everything you need for the building process, including the auditwheel tool. Knowing about the different compiling and building options for your package is important if you want it to have good uptake in the developer community. If your library only includes code (without external dependencies), a source distribution could be enough. However, for some projects it’s important to check that the code works for different Python versions (Python 3.6, python 3.7 and 3.8) and if you need external software and for these cases, wheels are a better option. ## Publishing a distribution After distribution building, we can publish our Python packages in a repo like Pypi. The Python package index (Pypi) is the official third-party software repository for Python, and package managers like pip use it as a default source for downloading packages and their dependencies. An easy way to upload our packages in Pypi is to use the [twine tool](https://twine.readthedocs.io/en/latest/). To install this, you can use pip: ```bash pip install twine ``` Before uploading your distribution to Pypi, you have to create an account in Pypi. This is easy to do--just fill out this form and confirm your email: https://pypi.org/account/register/. After that, you can run this command to upload your distribution: ``` twine upload dist/* ``` You will see something like this in your shell: <center> ![twine command](https://docs.monadical.com/uploads/upload_313eb107fe86e8eb629f95c6eeba3f29.png =x150) </center> After this step, your package will be available to be downloaded and used by different developers around the world. <center> ![package published](https://docs.monadical.com/uploads/upload_fea19e0728dcd18d39aab397f7f724ba.png =x300) </center> ## Conclusion As developers, we don’t spend a lot of time thinking about library creation: a lot of experienced Python developers have just never had to publish a new library. But, in general, the creation and publishing process is pretty easy, and it’s a great way to share developments with the community. In my next post, I’m going to look at how we can make the distribution process for new libraries more accessible for developers. I’ll be digging into the details of the manylinux project and sharing my ideas for how to improve it. [^1]: https://www.python.org/dev/peps/pep-0518/ explains more about the importance of these fields <center> <img src="https://monadical.com/static/logo-black.png" style="height: 80px"/><br/> Monadical.com | Full-Stack Consultancy *We build software that outlasts us* </center>

Angel Rey

Devops, Technology and Innovation @afreydev

Published on 2021-01-19