<center>
Implementing C++ Virtual Functions in Cython
===
*Written by [JDC](https://twitter.com/jdcaballerov). Originally published 2021-02-02 on the [Monadical blog](https://monadical.com/blog.html).*
</center>
___
*TLDR; This is an extensive article that assumes Cython knowledge and describes two strategies for using C++ code from Python, requiring the implementation of virtual functions in C++ abstract classes. Elements of these solutions have been discussed before, but they are scattered through forums, GitHub issues, and Stack Overflow. The first strategy implements C++ wrapper classes and then wraps them with Cython classes. The second strategy allows us to write the virtual functions for the abstract classes in Cython/Python.*
## Introduction
In [an earlier post](https://monadical.com/posts/knowledge-in-box.html), we described a project we did for [Kiwix](https://www.kiwix.org/en/about/), which we had a lot of fun working on. Kiwix offers an awesome service--an offline reader that makes a huge amount of online content available to people who have no or limited internet access. [More than half of the world’s population](https://www.itu.int/dms_pub/itu-s/opb/pol/S-POL-BROADBAND.18-2017-PDF-E.pdf) is in this position, due to infrastructure issues, censorship, or affordability. That’s 4 billion people without access to powerful resources like Wikipedia and Youtube that the rest of us take for granted. Kiwix scrapes sites like these in their entirety and stores them using a library called [‘libzim’](https://github.com/openzim/libzim). Libzim packages the content into a [ZIM file](https://wiki.openzim.org/wiki/ZIM_file_format), which can be easily saved to a phone, computer, or USB. Users can then browse websites as if they were online--all for free!
![kiwix logo](https://docs.monadical.com/uploads/upload_6049f777464802e30b5c4b226bcc5798.png)
Now, libzim is a C++ library and Kiwix’s content scraper is written in Python. Kiwix was originally copying all the scraped content to a disk (the file system hard drive) and then using another tool to bundle it into libzim. But this was pretty slow and took up too much disk space. You have to imagine the sheer magnitude of information Kiwix is dealing with here. Wikipedia on its own is vast--never mind when you add to that [YouTube](https://www.youtube.com/), [Project Gutenberg](https://www.gutenberg.org/), [Stack Exchange](https://stackexchange.com/), [Codecademy](https://www.codecademy.com/), [TED](https://www.ted.com/)...! And we’re not just talking about a one-shot process: Kiwix has to regularly download, organize, and bundle content to keep up with changes to these resources. The time had come to optimize this process.
![zim file](https://docs.monadical.com/uploads/upload_a6d949cbffbae82d7a340e6c7c3f0594.png)
This is where Monadical came in: our job was to develop Python bindings for libzim, replacing the intermediary disk and speeding up the communication between libzim and the scraper. If you want a higher-level overview of this project, you can check out [How to Fit All Human Knowledge in a Box](https://monadical.com/posts/knowledge-in-box.html). In this post, I want to explain the technical details of our solution and how we were able to bind Python with libzim by implementing virtual functions in Cython.
### The Challenge
After considering [different binding alternatives](https://realpython.com/python-bindings-overview/), we decided that [Cython](https://github.com/cython/cython) was the best fit because of its documentation, its active community, and the projects that use it, such as [numpy](https://cython.readthedocs.io/en/latest/src/tutorial/numpy.html).
For each scraped piece of content (bytes)--i.e., web pages, images, videos, js, other resources)--we needed to construct an article and call the `add_article` [function](https://github.com/openzim/libzim/blob/7716d8545bf26f7c3cc381affdcf8d2ca63e1768/include/zim/writer/creator.h#L48) on the `zim::writer::Creator` [class](https://github.com/openzim/libzim/blob/master/include/zim/writer/creator.h) within the library. This function has the following prototype:
```cpp
void addArticle(std::shared_ptr<Article> article);
```
However, this is how the class `zim::writer::Article` of the `article` parameter is [defined](https://github.com/openzim/libzim/blob/master/include/zim/writer/article.h)
```cpp
class Article
{
public:
virtual Url getUrl() const = 0;
virtual std::string getTitle() const = 0;
virtual bool isRedirect() const = 0;
virtual bool isLinktarget() const;
virtual bool isDeleted() const;
virtual std::string getMimeType() const = 0;
virtual bool shouldCompress() const = 0;
virtual bool shouldIndex() const = 0;
virtual Url getRedirectUrl() const = 0;
virtual zim::size_type getSize() const = 0;
virtual Blob getData() const = 0;
virtual std::string getFilename() const = 0;
virtual ~Article() = default;
// returns the next category id, to which the article is assigned to
virtual std::string getNextCategory();
};
```
Notice that the `Article` class is composed of [pure virtual functions](https://www.geeksforgeeks.org/pure-virtual-functions-and-abstract-classes/) and is an abstract class. That means we needed to provide `Article` objects with function implementations of the interface so that `libzim` is able to call the functions on those objects, i.e, to obtain the contents, title, etc. when writing the content to disk.
The first strategy we considered involved implementing the `Article` class on a `C++` wrapper with data members in C++ Land, and then wrapping this `C++` wrapper class with a Cython class, and passing the contents from Python/Cython to obtain a property based pythonic API.
Let's see how that works.
## Part 1: Basic Cython Wrapper
To start with, let's create a minimal `setup.py` file for our Cython wrapper.
`setup.py`
```python
import os
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()
setup(
name = "python-libzim",
version = "0.0.1",
author = "Monadical Inc - Juan Diego Caballero",
author_email = "jdc@monadical.com",
description = ("A python-facing API for creating and interacting with ZIM files"),
license = "GPLv3",
long_description=read('README.md'),
long_description_content_type='text/markdown',
ext_modules = cythonize([
Extension("libzim", ["libzim/*.pyx","libzim/wrappers.cpp"],
libraries=["zim"],
extra_compile_args=['-std=c++11'],
language="c++"),
],
compiler_directives={'language_level' : "3"}
)
)
```
### A Simple `C++` Wrapper
To implement our virtual class, let's define a `zim::writer::Article` wrapper named `ArticleWrapper` that will implement the required functions:
`libzim/wrappers.cpp`
```cpp
#include <string>
#include <iostream>
#include <zim/zim.h>
#include <zim/article.h>
#include <zim/blob.h>
#include <zim/writer/article.h>
#include <zim/file.h>
#include <zim/search.h>
#include <zim/writer/creator.h>
class ArticleWrapper : public zim::writer::Article
{
public:
virtual ~ArticleWrapper() = default;
ArticleWrapper(char ns, // namespace
std::string url,
std::string title,
std::string mimeType,
std::string redirectUrl,
bool _shouldIndex,
std::string content) : ns(ns),
url(url),
title(title),
mimeType(mimeType),
redirectUrl(redirectUrl),
_shouldIndex(_shouldIndex),
content(content)
{
}
char ns;
std::string url;
std::string title;
std::string mimeType;
std::string redirectUrl;
bool _shouldIndex;
std::string content;
std::string fileName;
// Virtual Member functions implementations
....
virtual std::string getTitle() const
{
return title;
}
....
virtual std::string getMimeType() const
{
return mimeType;
}
};
```
This simplified implementation defines functions over its data members as follows:
```cpp
virtual std::string getTitle() const
{
return title;
}
virtual zim::Blob getData() const
{
return zim::Blob(&content[0], content.size());
}
```
To enable a `Creator`'s `add_article` function to accept our newly defined `ArticleWrapper` class and set some defaults, we override the original `zim::writer::Creator` with `CreatorWrapper` as follows:
`libzim/wrapper.cpp` continuation
```cpp
class OverriddenZimCreator : public zim::writer::Creator
{
public:
OverriddenZimCreator(std::string mainPage)
: zim::writer::Creator(true),
mainPage(mainPage) {}
virtual zim::writer::Url getMainUrl()
{
return zim::writer::Url('A', mainPage);
}
void setMainUrl(std::string newUrl)
{
mainPage = newUrl;
}
std::string mainPage;
};
class CreatorWrapper
{
public:
CreatorWrapper(OverriddenZimCreator *creator) : _creator(creator)
{
}
~CreatorWrapper()
{
delete _creator;
}
static CreatorWrapper *create(std::string fileName, std::string mainPage, std::string fullTextIndexLanguage, int minChunkSize)
{
bool shouldIndex = !fullTextIndexLanguage.empty();
OverriddenZimCreator *c = new OverriddenZimCreator(mainPage);
c->setIndexing(shouldIndex, fullTextIndexLanguage);
c->setMinChunkSize(minChunkSize);
c->startZimCreation(fileName);
return (new CreatorWrapper(c));
}
void addArticle(std::shared_ptr<ArticleWrapper> article)
{
_creator->addArticle(article);
}
void setMainUrl(std::string newUrl)
{
_creator->setMainUrl(newUrl);
}
zim::writer::Url getMainUrl()
{
return _creator->getMainUrl();
}
OverriddenZimCreator *_creator;
};
```
Up to this point, we have normal `C++` wrapping code. We haven't created any Cython related code other than the minimal `setup.py`.
### Defining the interface
To use the wrapper's and library's code from Cython/Python, we need to let Cython know about the interface. We use the `*.pxd` files for this. Let's describe all the functions and data types to be used or called from our Cython/Python code:
`zim_wrapper.pxd`
```python
from libcpp.string cimport string
from libc.stdint cimport uint32_t, uint64_t
from libcpp cimport bool
from libcpp.memory cimport shared_ptr, unique_ptr
cdef extern from "zim/zim.h" namespace "zim":
ctypedef uint32_t size_type
ctypedef uint64_t offset_type
cdef extern from "zim/blob.h" namespace "zim":
cdef cppclass Blob:
char* data() except +
char* end() except +
int size() except +
cdef extern from "zim/writer/url.h" namespace "zim::writer":
cdef cppclass Url:
string getLongUrl() except +
cdef extern from "wrappers.cpp":
cdef cppclass ArticleWrapper:
ArticleWrapper(char ns,
string url,
string title,
string mimeType,
string redirectAid,
bool _shouldIndex,
string content) except +
string getTitle() except +
const Blob getData() except +
string getMimeType() except +
bool isRedirect() except +
Url getUrl() except +
Url getRedirectUrl() except +
char ns
string url
string title
string mimeType
string redirectUrl
string content
cdef cppclass CreatorWrapper:
@staticmethod
CreatorWrapper *create(string fileName, string mainPage, string fullTextIndexLanguage, int minChunkSize) except +
void addArticle(shared_ptr[ArticleWrapper] article) except +
Url getMainUrl() except +
void setMainUrl(string) except +
```
### The Cython Wrapper
With the `C++` wrappers defined in `wrappers.cpp` and a description of the interface that Cython can understand in `wrappers.pxd`, let's write the Cython/Python code that will use and call the C++ functions and data.
Cython requires that we write `*.pyx` files that allow us to combine Cython/Python code with `C++`.
The strategy for getting a pythonic API is to wrap the `C++` data types and class functions with Cython classes `cdef class ZimArticle` and `cdef class ZimCreator`.
Each class holds an "internal" pointer to an object of its corresponding `C++` class (either `ArticleWrapper` or `CreatorWrapper`) and wraps the functions and data types.
Let's start by defining Cython's `ZimArticle` and a `__cinit__` to set its "internal" pointer:
`pyzim.pyx`
```python
# This imports all the definitions from zim_wrapper.pxd (ArticleWrapper, etc)
cimport zim_wrapper as zim
cdef class ZimArticle:
# This is an "internal" pointer to hold a C++ ArticleWrapper
cdef zim.ArticleWrapper *c_zim_article
def __cinit__(self, url="", content="", namespace= "A", mimetype= "text/html", title="", redirect_article_url= "", should_index=True ):
# Creates a new ArticleWrapper object
c_zim_art = new zim.ArticleWrapper(ord(namespace), # Namespace
url.encode('UTF-8'), # url
title.encode('UTF-8'), # title
mimetype.encode('UTF-8'), # mimeType
redirect_article_url.encode('UTF-8'),# redirectUrl
should_index, # shouldIndex
content)
self.c_zim_article = c_zim_art
def __dealloc__(self):
if self.c_zim_article != NULL:
del self.c_zim_article
```
The code is straightforward and just creates new `C++` "internal" to the class objects `c_zim_article` from our `C++` wrapper class `ArticleWrapper`, getting the input from Python via the `__cinit__` constructor.
Now let's write Python wrappers for `ArticleWrapper` data members.
Fortunately, Cython has a neat feature that enables a very pythonic property-based API. When we wrote `zim_wrapper.pxd` above, notice that we also included `ArticleWrappers`'s' public data members (e.g., char ns, string title, etc.):
```python
cdef extern from "wrappers.cpp":
cdef cppclass ArticleWrapper:
ArticleWrapper(char ns,
string url,
string title,
string mimeType,
string redirectAid,
bool _shouldIndex,
string content) except +
string getTitle() except +
const Blob getData() except +
string getMimeType() except +
bool isRedirect() except +
Url getUrl() except +
Url getRedirectUrl() except +
char ns
string url
string title
string mimeType
string redirectUrl
string content
```
This allows us to use very simple accessor methods (setters, getters) on the Python/Cython side. We can access the public members with a simple dot operator:
```python
@property
def title(self):
"""Get the article's title"""
return self.c_zim_article.title.decode('UTF-8')
@title.setter
def title(self, new_title):
"""Set the article's title"""
self.c_zim_article.title = new_title.encode('UTF-8')
```
We follow a similar strategy for wrapping the C++ `CreatorWrapper` in Cython.
However, notice that the Cython wrapper for `add_article` will accept Cython `ZimArticle` objects. This will allow us to use the class with a pythonic API from Cython/Python, allowing it to take care of the internal details of dereferencing and creating a shared pointer inside the function as follows:
```python
def add_article(self, ZimArticle article):
"""Add a ZimArticle to the Creator object.
Parameters
----------
article : ZimArticle
The article to add to the file
"""
# Make a shared pointer to ArticleWrapper from the ZimArticle object (dereference internal c_zim_article)
cdef shared_ptr[zim.ArticleWrapper] art = make_shared[zim.ArticleWrapper](dereference(article.c_zim_article));
self.c_creator.addArticle(art)
```
A complete file with the simplified implementation will look like this:
#### Implementation
`pyzim.pyx`
```python
from libcpp.string cimport string
from libcpp cimport bool
from libcpp.memory cimport shared_ptr, unique_ptr, make_shared
import datetime
import copy
from collections import defaultdict
from cython.operator import dereference, preincrement
cimport zim_wrapper as zim
#########################
# ZimArticle #
#########################
cdef class ZimArticle:
"""
A class to represent a Zim Article.
Attributes
----------
*c_zim_article : zim.ArticleWrapper
a pointer to the C++ article object
Properties
-----------
namespace : str
the article namespace
title : str
the article title
content : str
the article content
longurl : str
the article long url i.e {NAMESPACE}/{redirect_url}
url : str
the article url
mimetype : str
the article mimetype
is_redirect : bool
flag if the article is a redirect
redirect_longurl: str
the long redirect article url i.e {NAMESPACE}/{redirect_url}
redirect_url : str
the redirect article url
"""
cdef zim.ArticleWrapper *c_zim_article
VALID_NAMESPACES = ["-","A","B","I","J","M","U","V","W","X"]
def __cinit__(self, url="", content="", namespace= "A", mimetype= "text/html", title="", redirect_article_url= "", should_index=True ):
"""Constructs a ZimArticle from parameters.
Parameters
----------
url : str
Article url without namespace
content : str - bytes
Article content either str or bytes
namespace : {"A","-","B","I","J","M","U","V","W","X"}
Article namespace (the default is A)
mimetype : str
Article mimetype (the default is text/html)
title : str
Article title
redirect_article_url : str
Article redirect if article is a redirect
should_index : bool
Flag if article should be indexed (the default is True)
"""
# Encoding must be set to UTF-8
#cdef bytes py_bytes = content.encode(encoding='UTF-8')
#cdef char* c_string = py_bytes
bytes_content =b''
if isinstance(content, str):
bytes_content = content.encode('UTF-8')
else:
bytes_content = content
if namespace not in self.VALID_NAMESPACES:
raise ValueError("Invalid Namespace")
c_zim_art = new zim.ArticleWrapper(ord(namespace), # Namespace
url.encode('UTF-8'), # url
title.encode('UTF-8'), # title
mimetype.encode('UTF-8'), # mimeType
redirect_article_url.encode('UTF-8'),# redirectUrl
should_index, # shouldIndex
bytes_content)
self.__setup(c_zim_art)
def __dealloc__(self):
if self.c_zim_article != NULL:
del self.c_zim_article
cdef __setup(self, zim.ZimArticle *art):
"""Assigns an internal pointer to the wrapped C++ article object.
A python ZimArticle always maintains a pointer to a wrapped zim.ZimArticle C++ object.
The python object reflects the state, accessible with properties, of a wrapped C++ zim.ZimArticle,
this ensures a valid wrapped article that can be passed to a zim.ZimCreator.
Parameters
----------
*art : zim.ArticleWrapper
Pointer to a C++ article object
"""
# Set new internal C zim.ZimArticle article
self.c_zim_article = art
@property
def namespace(self):
"""Get the article's namespace"""
return chr(self.c_zim_article.ns)
@namespace.setter
def namespace(self,new_namespace):
"""Set the article's namespace"""
if new_namespace not in self.VALID_NAMESPACES:
raise ValueError("Invalid Namespace")
self.c_zim_article.ns = ord(new_namespace[0])
@property
def title(self):
"""Get the article's title"""
return self.c_zim_article.title.decode('UTF-8')
@title.setter
def title(self, new_title):
"""Set the article's title"""
self.c_zim_article.title = new_title.encode('UTF-8')
@property
def content(self):
"""Get the article's content"""
data = self.c_zim_article.content
try:
return data.decode('UTF-8')
except UnicodeDecodeError:
return data
@content.setter
def content(self, new_content):
"""Set the article's content"""
if isinstance(new_content, str):
self.c_zim_article.content = new_content.encode('UTF-8')
else:
self.c_zim_article.content = new_content
@property
def longurl(self):
"""Get the article's long url i.e {NAMESPACE}/{url}"""
return self.c_zim_article.getUrl().getLongUrl().decode("UTF-8", "strict")
@property
def url(self):
"""Get the article's url"""
return self.c_zim_article.url.decode('UTF-8')
@url.setter
def url(self, new_url):
"""Set the article's url"""
self.c_zim_article.url = new_url.encode('UTF-8')
@property
def redirect_longurl(self):
"""Get the article's redirect long url i.e {NAMESPACE}/{redirect_url}"""
return self.c_zim_article.getRedirectUrl().getLongUrl().decode("UTF-8", "strict")
@property
def redirect_url(self):
"""Get the article's redirect url"""
return self.c_zim_article.redirectUrl.decode('UTF-8')
@redirect_url.setter
def redirect_url(self, new_redirect_url):
"""Set the article's redirect url"""
self.c_zim_article.redirectUrl = new_redirect_url.encode('UTF-8')
@property
def mimetype(self):
"""Get the article's mimetype"""
return self.c_zim_article.mimeType.decode('UTF-8')
@mimetype.setter
def mimetype(self, new_mimetype):
"""Set the article's mimetype"""
self.c_zim_article.mimeType = new_mimetype.encode('UTF-8')
@property
def is_redirect(self):
"""Get if the article is a redirect"""
return self.c_zim_article.isRedirect()
def __repr__(self):
return f"{self.__class__.__name__}(url={self.longurl}, title=)"
#########################
# ZimCreator #
#########################
cdef class ZimCreator:
"""
A class to represent a Zim Creator.
Attributes
----------
*c_creator : zim.CreatorWrapper
a pointer to the C++ Creator object
_filename : str
zim filename
"""
cdef zim.CreatorWrapper *c_creator
cdef object _filename
cdef object _main_page
cdef object _index_language
cdef object _min_chunk_size
def __cinit__(self, str filename, str main_page = "", str index_language = "eng", min_chunk_size = 2048):
"""Constructs a ZimCreator from parameters.
Parameters
----------
filename : str
Zim file path
main_page : str
Zim file main_page
index_language : str
Zim file index language (default eng)
min_chunk_size : int
Minimum chunk size (default 2048)
"""
self.c_creator = zim.CreatorWrapper.create(filename.encode("UTF-8"), main_page.encode("UTF-8"), index_language.encode("UTF-8"), min_chunk_size)
self._filename = filename
self._main_page = self.c_creator.getMainUrl().getLongUrl().decode("UTF-8", "strict")
self._index_language = index_language
self._min_chunk_size = min_chunk_size
@property
def filename(self):
"""Get the filename of the ZimCreator object"""
return self._filename
@property
def main_page(self):
"""Get the main page of the ZimCreator object"""
return self.c_creator.getMainUrl().getLongUrl().decode("UTF-8", "strict")[2:]
@main_page.setter
def main_page(self,new_url):
"""Set the main page of the ZimCreator object"""
# Check if url longformat is used
if new_url[1] == '/':
raise ValueError("Url should not include a namespace")
self.c_creator.setMainUrl(new_url.encode('UTF-8'))
@property
def index_language(self):
"""Get the index language of the ZimCreator object"""
return self._index_language
@property
def min_chunk_size(self):
"""Get the minimum chunk size of the ZimCreator object"""
return self._min_chunk_size
def add_article(self, ZimArticle article):
"""Add a ZimArticle to the Creator object.
Parameters
----------
article : ZimArticle
The article to add to the file
"""
# Make a shared pointer to ArticleWrapper from the ZimArticle object (dereference internal c_zim_article)
cdef shared_ptr[zim.ArticleWrapper] art = make_shared[zim.ArticleWrapper](dereference(article.c_zim_article));
self.c_creator.addArticle(art)
def __repr__(self):
return f"{self.__class__.__name__}(filename={self.filename})"
```
### Compiling the Extension
To compile our extension, the following command is inputed:
```bash
python3 setup.py build_ext -i
```
### A Pythonic API
This implementation using the `ZimArticle` wrapper enables a very pythonic property-based API.
`example.py`
```python
import libzim
test_content = '''<!DOCTYPE html>
<html class="client-js">
<head><meta charset="UTF-8">
<title>Monadical</title>
<h1> ñññ Hello, it works ñññ </h1></html>'''
# Create a filled article
article = libzim.ZimArticle(namespace="A", url = "Monadical", title="Monadical", content=test_content, should_index = True)
print(article.longurl)
print(article.url)
# Create an empty article then fill it
article2 = libzim.ZimArticle()
article2.content = test_content
article2.url = "Monadical_SAS"
article2.title = "Monadical SAS"
```
Using the creator is also straightforward:
```python
# Write the articles
zim_creator = libzim.ZimCreator('test.zim',main_page = "welcome",index_language= "eng", min_chunk_size= 2048)
# Add article to zim file
zim_creator.add_article(article)
zim_creator.add_article(article2)
```
### Summary of the solution using the first strategy
![](https://docs.monadical.com/uploads/upload_bd5d4032c302a212cdd36955504e9d88.png)
In this solution, the article wrapper class `ArticleWrapper` in C++ Land implements the virtual functions of the abstract class `zim::writer::Article` i.e., `getData(), getTitle()` and declares public data members i.e., `title, content, etc`.
A Cython wrapper class `ZimArticle` creates a pointer to a new object of the C++ `ArticleWrapper` class used to fill and access the declared public data members of `ArticleWrapper` objects using the constructor and Python properties as accessor functions.
## Part 2: Writing virtual functions in Cython
The first strategy produces a neat pythonic API. However, it might involve holding a huge amount of content in memory and might not be appropriate for every use case. Let’s say that, instead of getting the article content from a short Python string, we’re getting it from a video stream reader. With the first solution, we would need to hold all the content in memory (std::string content) until it could be written to disk by `libzim`.
Think of this strategy as working a bit like service in a restaurant: the kitchen has to prepare and store the whole table’s order before it can go out. This is fine if you’re dealing with tables of four, but if a hundred customers walk in (i.e. someone wants to download Wikipedia), you’re going to need a different approach.
Wouldn’t it be nice if we could implement the `getTitle()` function--and the others required by the `libzim` interface--directly in Python, so that the data is lazy loaded by `libzim` when needed? In this case, the data would be siphoned from the reader to the disk by libzim. This was the second strategy we considered.
Let’s try it out.
Cython not only allows us to call C code from Python, it also allows us [to make declarations from a Cython module available for use by external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#using-cython-declarations-from-c), thus exposing a public API.
To implement this strategy, we will declare a public Cython API that will receive a pointer to a Python object and a function to call in the Python object. The return value from Python Land will be passed to C++ via the public API.
![](https://docs.monadical.com/uploads/upload_8e468898c543f7410bdea5b7aadfea45.png)
### Getting an article title in C++: A sample execution journey
To fully understand this strategy, let’s look at how an article title is finally available in C++ Land. As you may notice in the diagram above, our C++ `ArticleWrapper` class no longer declares public data members but holds a pointer to a Python object: `PyObject` `*obj`. When a Cython class `ZimArticle` is constructed, a pointer to an `ArticleWrapper` is initialized with a pointer to self (the current `ZimArticle`). This makes a pointer to the Cython `ZimArticle` available in the C++ wrapper object.
The next step is using a public API, exposing a Python/Cython function `cy_call_fct` callable from C++ Land that takes as arguments a pointer to a Python/Cython object (a `ZimArticle`) and a function name to call on the Python/Cython object. The result is returned to the caller in C++ Land.
Then, we declare member functions in C++ Land (i.e `getTitle()` ) that use the public API to obtain the data from Python/Cython Land.
Let’s follow the call from C++ Land inside libzim.
When, deep inside `libzim`, the function `getTitle()` is called on an `ArticleWrapper` (a subclass of `zim::writer::Article` ) object, this implementation returns whatever it obtains from calling `cy_call_fct(*obj,’get_title’)`. This function is a public API function exposed from Python/Cython Land that returns the evaluation of the function `get_title` on the object `*obj` that is the `ZimArticle`. This way, we end up with the string `“Hello”` lazily loaded and available in C++ Land.
What we have constructed is the equivalent of implementing C++ virtual functions in Python/Cython.
![show me the code quote](https://docs.monadical.com/uploads/upload_fcc2f5b4843698a64cdaf7dd740668ad.jpg)
### Implementation
First let’s declare `ZimArticle` with the constructor `__init__` saving a pointer to a `ZimArticleWrapper` and passing a pointer to self `<cpy_ref.PyObject*>self` as an initialization argument.
`libzim.pyx`
```python3
cimport libzim
cimport cpython.ref as cpy_ref
from cython.operator import dereference
from libcpp.string cimport string
cdef class ZimArticle:
cdef ZimArticleWrapper* c_article
def __init__(self):
self.c_article = new ZimArticleWrapper(<cpy_ref.PyObject*>self)
def get_url(self):
raise NotImplementedError
def get_title(self):
return “Hello”
```
Now, let’s declare a public API function that returns a string from evaluating a function on Python/Cython objects.
`libzim.pyx`
```python3
cdef public api:
string string_cy_call_fct(object obj, string method, string *error) with gil:
"""Lookup and execute a pure virtual method on ZimArticle returning a string"""
try:
func = getattr(obj, method.decode('UTF-8'))
ret_str = func()
return ret_str.encode('UTF-8')
except Exception as e:
error[0] = traceback.format_exc().encode('UTF-8')
return b""
```
With the public API defined, we will be able to call `string_cy_call_fct` from C++ code by including the Cython auto-generated header file `libzim_api.h`. Let’s implement the C++ wrapper that uses the Cython public API to obtain the title:
`lib.h`
```cpp
// -*- c++ -*-
#ifndef libzim_LIB_H
#define libzim_LIB_H 1
struct _object;
typedef _object PyObject;
#include <zim/zim.h>
#include <zim/writer/article.h>
#include <string>
class ZimArticleWrapper : public zim::writer::Article
{
public:
PyObject *m_obj;
ZimArticleWrapper(PyObject *obj);
virtual ~ZimArticleWrapper();
virtual std::string getTitle() const;
private:
std::string callCythonReturnString(std::string) const;
};
#endif // !libzim_LIB_H
```
`lib.cxx`
```cpp
#include <Python.h>
#include "lib.h"
// THE FILE BELOW IS AUTOGENERATED BY CYTHON AND INCLUDES BOTH (import_libzim__wrapper and string_cy_call )
#include "libzim_api.h"
#include <cstdlib>
#include <iostream>
/*
#########################
# ZimArticle #
#########################
*/
ZimArticleWrapper::ZimArticleWrapper(PyObject *obj) : m_obj(obj)
{
if (import_libzim__wrapper())
{
std::cerr << "Error executing import_libzim!\n";
throw std::runtime_error("Error executing import_libzim");
}
else
{
Py_XINCREF(this->m_obj);
}
}
ZimArticleWrapper::~ZimArticleWrapper()
{
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
Py_XDECREF(this->m_obj);
PyGILState_Release(gstate);
}
std::string ZimArticleWrapper::callCythonReturnString(std::string methodName) const
{
if (!this->m_obj)
throw std::runtime_error("Python object not set");
std::string error;
std::string ret_val = string_cy_call_fct(this->m_obj, methodName, &error);
if (!error.empty())
throw std::runtime_error(error);
return ret_val;
}
std::string
ZimArticleWrapper::getTitle() const
{
return callCythonReturnString("get_title");
}
```
Finally, to use the wrapper from Cython we need to describe the interface:
`libzim.pxd`
```python3
from cpython.ref cimport PyObject
from libcpp.string cimport string
cdef extern from "lib.h":
cdef cppclass ZimArticleWrapper(Article):
ZimArticleWrapper(PyObject *obj) except +
const string getTitle() except +
```
## Conclusion
This article presented two strategies for implementing C++ virtual functions with Cython. The first consisted of implementing the functions in C++ and wrapping them with Cython. The required data was passed from Cython/Python and data copies were kept in C++ Land. Although this strategy works in general, and enables a neat pythonic API, we needed an approach that could better accommodate the huge amount of scraped content involved in our project.
The second strategy was a much better fit for us, since it means that there’s no need to hold all that content in memory. It also allows code to be implemented in Python instead of C++, making it more accessible, as well as easier and faster to develop. The second strategy also means that we can program to the interface and not to implementation. Unlike with Strategy One, which involves doing intermediate implementations, Strategy Two allows us to provide an implementation from Cython.
If you thought that was cool, check out our other projects [here](https://monadical.com/projects.html).
References
====
https://github.com/cython/cython/wiki/enchancements-inherit_CPP_classes
https://stackoverflow.com/questions/10126668/can-i-override-a-c-virtual-function-within-python-with-cython
https://stackoverflow.com/questions/32257889/passing-cython-class-object-as-argument-to-c-function
https://groups.google.com/forum/#!topic/cython-users/vAB9hbLMxRg
---
<center>
<img src="https://monadical.com/static/logo-black.png" style="height: 80px"/><br/>
Monadical.com | Full-Stack Consultancy
*We build software that outlasts us*
</center>
Recent posts:
- So you want to build a social network?
- Mastering Project Estimation
- Typescript Validators Jamboree
- Mindfulness in Typescript code branching. Exhaustiveness, pattern matching, and side effects
- View more posts...
Back to top