How to Make Library Distribution Easier - HedgeDoc
  1909 views
 owned this note
<center> How to Make Library Distribution Easier === <big> **A Manylinux-Based Web Building Platform** </big> *Written by [Angel](https://twitter.com/afreydev). Originally published 2021-01-26 on the [Monadical blog](https://monadical.com/blog.html).* </center> --- Last year, Monadical was contacted by the Kiwix project team for help with coding a new Python library called ‘Python-Libzim’. This library helps users access a cool C++ module called ‘Libzim’ for use in their Python projects.[^1] I loved working on this--it gave me the chance to see how things work under the hood in Python library creation. [In my last post](https://monadical.com/posts/how-to-build-a-python-library.html), I explained what I’ve learned about how libraries are developed, tested, built, and published (if you’re not already familiar with the topic, check it out before reading on!). In this post, I’m going to take a closer look at how we build wheel distributions. Wheels are important for the portability of a package: they ensure that it can run on different Linux flavors. The advantage of wheels is that the final user can download them quickly, since they’re much lighter than source distributions. But there’s a catch! Wheels have to be built on the package developer’s machine and so there’s a risk that, if the final user has a different architecture (or operating system or environment), the package won’t run. The Python community has developed some tools to address this problem. One of them is the manylinux project. ## Manylinux [Manylinux](https://github.com/pypa/manylinux) is a project that facilitates the distribution of binary Python extensions as wheels on Linux. The end result is a package that can run on most Linux distros without any problem. To use manylinux, you have to use the official Docker images (based on CentOS Linux distro) that come with everything you need for the building process, including the auditwheel tool.[^2] Before using manylinux, you should know which version of Python you want your package to run on, which Python packages your package will depend on (you should install these using pip), and which OS package your package will depend on (you should install these using yum). To get a quick idea of how manylinux works, check out [Python wheels manylinux build--GitHub Action](https://github.com/RalfG/python-wheels-manylinux-build). In this instance, the manylinux container mounts your source code path in the container and runs this command: ```bash /opt/python/"${PY_VER}"/bin/pip wheel . ${PIP_WHEEL_ARGS} || { echo "Building wheels failed."; exit 1; } ``` To begin with, you will have the wheel files in a folder called ‘dist’ (or the default defined in the setup.py file). After the container runs this command, the wheel files are generated in a folder. <center> ![build process](https://docs.monadical.com/uploads/upload_5a0b85eeae4f05996e434a6e66cc9169.png) </center> With manylinux, the preparation steps will vary with each project. For example, these are the additional pre-build tasks that we would have needed to make the Python-Libzim project with manylinux: ```bash # Download the binaries wget -q https://download.openzim.org/release/libzim/libzim_linux-x86_64-6.1.8.tar.gz # Untar the code tar --gunzip --extract --file=libzim_linux-x86_64-6.1.8.tar.gz # Clone git clone https://github.com/openzim/python-libzim.git # Run the manylinux container mounting the source code folder in /opt/src docker run --rm -ti -v $PWD:/opt/src quay.io/pypa/manylinux2010_x86_64 # Copy libzim libraries in some folders (to link them) cp -p libzim_linux-x86_64-6.1.8/lib/x86_64-linux-gnu/libzim.so.6.1.8 python-libzim/lib/libzim.so cp -p libzim_linux-x86_64-6.1.8/lib/x86_64-linux-gnu/libzim.so.6.1.8 python-libzim/lib/ ln -s $PWD/libzim_linux-x86_64-6.1.8/include/zim/ $PWD/python-libzim/include/zim cd python-libzim/ # Updating links ldconfig $PWD/lib # Build the wheels for python 3.7 /opt/python/cp37-cp37m/bin/pip wheel . ``` As you can see, this process is really library-specific. Although manylinux is a reliable way to improve the Python distribution and portability of a package, it requires a bit of investment! Manylinux doesn’t have a lot of documentation and so using it takes a lot of extra work: you have to use containers, write Bash scripts, and use Linux commands. Ultimately, manylinux is pretty involved and can be a pain to navigate for many programmers. If there were a simpler way to execute this process, just imagine how many cool Python packages there would be out there. So, how can we improve things?[^3] ## Architecture for a web building platform I’d like to see a standardized platform for using manylinux, one that would allow it to be included in different CI processes or implemented in one click in the building process of a package. This could be done using a web building platform that requires only basic configuration--a web service that can publish a built library in the Python package repos. The source code for the prototype I've built is availble here: https://github.com/afreydev/manylinux-web.git. Here’s what I propose for the architecture of this web building platform: * **Manylinux builder containers:** These run containers based on the official manylinux images (2010 and 2014), but they include a little API built on Flask too. Basically, they do the building work and upload the packages to the pypiserver. These containers are started dynamically by the ManylinuxWebGUI (or a celery task) using the concept of Docker in Docker. * **ManylinuxWebGUI:** This is a Flask application that contains a form to set the Python versions and the manylinux version you want to use for the building process. * **Celery:** This is an instance of the Flask application, but it runs a celery worker to build projects asynchronously. * **Pypiserver:** This is a server that helps to store the built wheel packages. In this project, I used the official Docker image to deploy it. <center> ![architecture](https://docs.monadical.com/uploads/upload_d68a97e20d8af3aa3b06767d997fd963.png) </center> ### Manylinux builder containers This Dockerfile installs Flask and other required pip packages and copies the Flask application (server.py) into the image. ```bash FROM quay.io/pypa/manylinux2010_x86_64 ENV PLAT manylinux2010_x86_64 ENV FLASK_APP=/opt/webserver/server.py # Creating a venv for python packages RUN /opt/python/cp37-cp37m/bin/python -m venv /opt/venv RUN . /opt/venv/bin/activate && pip install Flask twine gitpython COPY server.py /opt/webserver/ WORKDIR /opt/webserver CMD . /opt/venv/bin/activate && flask run --host=0.0.0.0 ``` ### The server Flask app This application basically wraps the Bash commands to: * Clone the project we want to build * Run wheel commands for the selected Python versions * Upload the dist folder to the pypiserver * Add some messages to return to the WebGUI or celery ```python @app.route('/run', methods=['POST']) def run(): data = request.get_json() commands = [] messages = [] Repo.clone_from(data["git"], "/opt/src") for python_version in data["versions"]: command = "/opt/python/{}/bin/pip wheel . -w ./dist --no-deps".format(python_version_av[python_version]) return_v = subprocess.call(command.split(), stdout=subprocess.PIPE, cwd="/opt/src") if return_v != 0: messages.append({"type":"WAR", "message": "Fail building for: {}".format(python_version)}) else: messages.append({"type":"MES", "message": "Success building for: {}".format(python_version)}) command = "twine upload -u monadical -p monadical --repository-url http://pypi:8080 dist/*" return_v = subprocess.call(command.split(), stdout=subprocess.PIPE, cwd="/opt/src") if return_v != 0: messages.append({"type":"WAR", "message": "Fail sending wheels to pypiserver"}) else: messages.append({"type":"MES", "message": "Success publishing in pypiserver"}) return jsonify(messages) ``` ### ManylinuxWebGUI This is a simple form, made using [wtforms](https://wtforms.readthedocs.io/en/2.3.x/) that enables Flask to get the settings from the user and launch the building process. The GUI includes a list of tasks to monitor the tasks that are running asynchronously. If the user selects the async option, the platform will create a celery task to manage the process. If the user does not select this option, they can wait for the GUI to refresh. ```python @app.route('/', methods=['GET', 'POST']) def index(): form = BuildForm() if form.validate_on_submit(): settings = { "git": form.git.data, "versions": manylinux.get_versions(form), "manylinux_version": form.manylinux_version.data } if form.async_build.data: # When the user select an async build it create a celery task task = build_task.delay(settings, False) db.create_task(app, settings, task.id) flash("Wait a moment for task: {}".format(task.id), "MES") return redirect(url_for('tasks')) else: build(settings, True) return redirect(url_for('index')) return render_template('index.html', form=form) ``` <center> ![settings screen](https://docs.monadical.com/uploads/upload_b796aa4d2e15d6283b1cc57d126070a4.png =x400) </center> This component involves an interesting element: the concept of Docker in Docker. The idea here is that we can run containers using the Docker SDK from another container. To get these capabilities, the trick is to mount the host docker.sock file to the same file in the container. ```yaml web-docker: image: web-docker build: web ports: - 5000:5000 env_file: - .env depends_on: - redis - mysql volumes: - ./web:/opt/app - /var/run/docker.sock:/var/run/docker.sock # Some comment command: flask run --host=0.0.0.0 networks: - manylinux_network ``` After adding that line, it’s possible to use the Docker sdk for Python to create and destroy containers in a dynamic way based on the user’s pre-selected parameters. ```python def create_container_many_linux(settings): client = docker.from_env() container_name = "manylinux_{}".format(str(random.randint(0, 10000))) # Create a new manylinux container with a random name # TODO: Improve docker network discovering container = client.containers.run( manylinux_versions[settings["manylinux_version"]], environment=["FLASK_ENV=development"], network=MANYLINUX_NETWORK, detach=True, remove=True, name=container_name ) return container_name ``` ### Celery Building packages can sometimes be a slow process. For that reason, a good way to decouple the build step from the build request is to use a celery task. I have included a function decorated with a task function in the Flask app. This function creates a new celery task and does the same thing as the sync process (it creates the manylinux container, sends the building settings, and receives the response messages). The difference is that this flow sends the user to a page called ‘tasks’. This tasks page shows us the pending and successful/failed tasks. ```python @celery.task() def build_task(settings, flash): # This wraps the build function return build(settings, flash) ``` <center> ![tasks screen](https://docs.monadical.com/uploads/upload_53f2ed48160ed3ce3da3ccc0a8675bb4.png =x350) </center> ### Pypiserver <center> ![pypi logo](https://docs.monadical.com/uploads/upload_727df792312737c8469391ab9682a589.png =x300) </center> [Pypi](https://pypi.org/) is the official Python package index. Usually, if you want to install a package using pip (or Pipenv), then the package will be downloaded from there. The companies that use Python as their official programming language don’t always publish all the software they produce. For that reason, I used a Pypi clone server for a local installation. https://hub.docker.com/r/pypiserver/pypiserver/tags/ https://pypi.org/project/pypiserver/#using-the-docker-image ```yaml pypi: image: pypiserver/pypiserver ports: - 8080:8080 volumes: - ./packages:/data/packages - ./etc/pypiserver/.htpasswd:/data/.htpasswd command: -P .htpasswd --overwrite packages networks: - manylinux_network ``` Since this project is a prototype, I have added a unique user called ‘Monadical’ (with the password ‘monadical’) to the Pypi server, with a .htpasswd file (the same one we use to basic auth in apache). I have also mounted a directory called ‘packages’ on /data/packages (inside the container). ## Conclusion As developers, we make really good use of libraries, but we don’t tend to contribute much to their creation. I really like Python for programming--it’s a versatile and multi-purpose language, and its ecosystem is robust and stable. However, some of its processes--for example, creating wheels with manylinux--could definitely be improved! The solution that I’ve proposed here is just a prototype. (To make it more complete, I would need to add a few more features--for example, an option for setting the operating system dependencies and a web service endpoint to integrate with other tools.) I wanted to explore an example of the kind of projects that we can do to improve the Python development process. At Monadical, we love open source and we’re passionate about contributing to the open source community. More developers getting involved with the creation side of Python libraries would mean more and better Python-based software for everyone. If you have any ideas or comments about this proposal, I want to hear them! You can contact me [@afreydev](https://twitter.com/afreydev). [^1]: If you’re interested in the problem-solving process for this project, check out [How to Fit All Human Knowledge in a Box](https://monadical.com/posts/knowledge-in-box.html). [^2]: Currently, the manylinux images that are available are manylinux2010 and manylinux2014. The main difference between these two versions is that manylinux2010 is based in CentOS 6 (an unsupported version) and manylinux2014 on CentOS 7. [^3]: Building or improving services for building distributions (in particular, a generic wheel building service) is an important pending task for the [Python Software Foundation](https://github.com/psf/fundable-packaging-improvements/blob/master/FUNDABLES.md#better-specifications-toolchain-and-services-for-building-distributions). <center> <img src="https://monadical.com/static/logo-black.png" style="height: 80px"/><br/> Monadical.com | Full-Stack Consultancy *We build software that outlasts us* </center>



Recent posts:


Back to top