How to Migrate from Firebase to Django (or another SQL ORM) - CodiMD
  231 views
<center> <big> How to Migrate from Firebase to Django === </big> *Written by --- Karim Marzouq Originally published 2020-09-29 on the [Monadical blog](https://monadical.com/blog.html).* Strategy and tips on how to migrate from Firebase to Django (or another SQL ORM) </center> **Contents:** [TOC] In recent years, Firebase has emerged as a strong contender in the NoSQL space, and in web application programming in general. It has allowed frontend developers with little to no backend knowledge to develop applications quickly by leveraging Firebase’s backend-as-a-service model. Django is an open-source web application framework that implements the [Model View Controller](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller) pattern. Through Django, you also have access to the large and mature open-source Python ecosystem. Depending on your use case, this may save you an [unexpected bill of $30k](https://hackernoon.com/how-we-spent-30k-usd-in-firebase-in-less-than-72-hours-307490bd24d), since Firebase charges per request, while Django + a SQL database can be hosted on a single, fixed $5/month VPS, or scaled horizontally to however much you can afford. Indeed, one of the main reasons a recent client of ours decided to make the switch from Firebase to Django was that, as his app became more popular, the cost of Firebase was getting to be too high. In this blog post, I offer a migration path from Firebase to Django. ### Step One: Get your models in order The very first step in migrating your Firebase application to Django is to stabilize your models. By this, I mean that you should have a concrete idea of how your data is structured and [related](https://en.wikipedia.org/wiki/Relational_database). Take a look at the objects in your Firebase store and try to follow them as they move through your frontend application. Which fields are actually used? Which fields can be normalized? It's hard to overstate how important this step is. Look at your data in the collections interface of Firebase first, then as they flow into your application, and, finally, start defining your Django models. If you have no experience with SQL, then you will need to familiarize yourself with at least [one-to-many relationships](https://docs.djangoproject.com/en/3.0/topics/db/examples/many_to_one/). Armed with this knowledge, you should have a Django model for each of your collections in Firebase. If you are a developer who relies on Firebase, this will likely be the hardest step for you, since, unlike with Firebase, you will now need to make concrete decisions about how your objects should look and behave. However, just because you are using [Django's migrations](https://docs.djangoproject.com/en/dev/topics/migrations/) to manage our SQL database schema, this does not mean that the decisions for a model are set in stone. Indeed, the migration system exists precisely for that: propagating changes you make to your models into your SQL database schema. Django provides a [JSON field](https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.JSONField) that you can use as a 'catchall' for fields where you aren't sure of their structure or presence. However, I don't recommend using it as your primary method for storing data - this would cost you the goodies that come with Django and its ORM, namely: type checking, serialization, forms, auto-indexing, and migrations. But what about all those Firebase objects that contain data that you don't know what to do with, or that were derived using Cloud Functions? And what about data that you stored in objects at some point and didn't end up using, but don't want to leave behind in Firebase once you migrate? For those I suggest using a JSON field in Django and calling it `metadata`. I’ll expand on this some more below. ### Step Two: Migrate the user model The next step is to ensure you have a working user/auth model. This is because most applications' models will have relationships with the user model. Firebase makes auth exceptionally easy with JWTs, which can optionally include claims that replace a complicated auth model. (E.g: you don’t need groups and permissions checks if each JWT includes all permissions the user has access to.) Luckily with the [Firebase cli](https://github.com/firebase/firebase-tools) you can do: ``` firebase auth:export accounts.json --format=json --project <project_name> ``` and it will generate a JSON file with all your Firebase users' details, including the hash and salt for their passwords if they set up their account with password-based authentication. For the user models loading, this script can be used: ```python3 import json from django.db import transaction from django_firebase_scrypt import FirebaseScryptPasswordHasher from app_name.models import User # or wherever your user model is def run(*args): with open("../data/accounts.json") as users_file: users = json.load(users_file)["users"] downloaded_from_firebase = len(users) imported_users = 0 for firebase_user in users: # ensure each user has displayName if 'displayName' not in firebase_user: firebase_user["displayName"] = firebase_user["email"] with transaction.atomic(): # use transactions to guard against data corruption during loading # Get first_name & last_name for django User model by bisecting displayName if len(firebase_user["displayName"].split(" ")) >= 2: first_name = firebase_user["displayName"].split(" ", maxsplit=1)[0][:30] last_name = firebase_user["displayName"].split(" ", maxsplit=1)[1] else: first_name = firebase_user["displayName"][:30] last_name = firebase_user["displayName"] # fields common to both normal login & Google Sign-In defaults = { "email": firebase_user["email"], "firebase_id": firebase_user["localId"], "first_name": first_name, "last_name": last_name, "username": firebase_user["email"], } if 'passwordHash' in firebase_user: django_password = f"{FirebaseScryptPasswordHasher.algorithm}${firebase_user['salt']}${firebase_user['passwordHash']}" defaults["password"] = django_password django_user, created = User.objects.update_or_create( email=firebase_user["email"], defaults=defaults ) django_user.save() # add any logic you want after loading a user imported_users += 1 print(f"Imported or updated {imported_users} users from a firebase dump of {downloaded_from_firebase} users.") ``` This script iterates through the file downloaded from the CLI and creates accounts for the users in it, setting their passwords to the hash+salt that were obtained from Firebase. Note that I use [update_or_create](https://docs.djangoproject.com/en/3.1/ref/models/querysets/#update-or-create) to allow the script to be [idempotent](https://en.wikipedia.org/wiki/Idempotence). Additionally, [transactions](https://en.wikipedia.org/wiki/Database_transaction) are used to ensure no broken database objects (rows) are created. Also note that not much can be done about users that used Sign-In with Google - they will have to reauthenticate. [You should use Django REST Framework to set up JWT auth.](https://simpleisbetterthancomplex.com/tutorial/2018/12/19/how-to-use-jwt-authentication-with-django-rest-framework.html) To run this script, I recommend adding it to a scripts folder at the root of your Django project and using [RunScript from the excellent django-extensions package](https://django-extensions.readthedocs.io/en/latest/runscript.html) to run it. You will also need to install [django-firebase-scrypt](https://github.com/mcsimps2/django-firebase-scrypt) to be able to save the hashed password correctly. If you have set up everything correctly in this section, you should be able to sign in using password authentication. ### Step Three: Migrating all other collections Finally, you are ready to start migrating actual data! Since the Firebase CLI doesn't offer us a way to export our collections in JSON format, you will have to 'stream' them (i.e: read them) one by one from Firebase using the Firebase Python SDK. Make sure you install the [Firebase Admin Python SDK](https://github.com/firebase/firebase-admin-python). Then create a new file, firebase_helpers.py, in your application directory: ```python from django.conf import settings import firebase_admin from firebase_admin import credentials from firebase_admin import firestore # Use a service account if not firebase_admin._apps: cred = credentials.Certificate(settings.FB_ADMIN_PATH) # path to fbadmin file firebase_admin.initialize_app(cred) db = firestore.client() ``` Now you have an instance of the Firestore database, which can be used to query our collections or stream them one by one. At Monadical, when we were helping a client achieve this, we decided to first download all the objects in a collection, pickle them, and save them to a file. We then dealt with how to load them in a separate script. That allowed for faster iteration on the loading of the objects from Firebase, since we didn’t have to stream them directly to the Django database, and could instead load them from the disk as pickled objects. To download the files in your collection, you need to write a script similar to the following: ```python3 import pickle from app_name.firebase_helpers import db def run(*args): blogs_dict = {"blogs": []} blogs = db.collection('blogs') for blog in blogs.stream(): blog_dict = blog.to_dict() blog_dict["id"] = blog.id blogs_dict["blogs"].append(blog_dict) print(f"Saving blog with id {blog.id}") blogs_filehandler = open('../data/blogs.objs', 'wb') pickle.dump(blogs_dict, blogs_filehandler) print(f"Saved {len(blogs_dict['blogs'])} blogs") ``` You will notice two things here; the first is that I am serializing them as pickled dicts, not JSON. That's because JSON objects are serialized to dicts in Python anyway, so there’s no need for an intermediary step. Second, I add the object id (the blog id in this case) to the dict of each blog object - this is to have the object as complete as possible when it enters the Django database. Next, you will have to load them. Similar to the script loading the users above, the goal is to be as idempotent as possible. ```python3 import pickle from django.db import transaction from app_name.models import Blog from app_name.models import User def run(*args): blogs_filehandler = open('../data/blogs.objs', 'rb') blogs_dict = pickle.load(blogs_filehandler) downloaded_from_firebase = len(blogs_dict["blogs"]) invalid_blog_objects = 0 failed_to_import_blogs = 0 imported_blogs = 0 for blog in blogs_dict["blogs"]: with transaction.atomic(): try: owner = User.objects.get(firebase_id=blog["owner"]) defaults = { "name": blog["title"], "owner": owner, "metadata": {"firebase_id": blog["id"]} } if 'some_key' in blog: defaults["metadata"] = {"firebase_id": blog["id"], "some_key": blog["some_key"]} django_blog, created = Blog.objects.update_or_create( metadata__firebase_id=blog["id"], defaults=defaults) django_blog.save() imported_blogs += 1 except User.DoesNotExist: print(f"Couldn't find owner with firebase_id {blog['owner']} of blog {blog['id']}") failed_to_import_blogs += 1 print( f"Imported or updated {imported_blogs} of a total {downloaded_from_firebase} from firebase. {invalid_blog_objects} were invalid & {failed_to_import_blogs} couldn't be loaded because their owner couldn't be found in Django DB." ) ``` Note that as part of the loading of those pickled objects, I added the `firebase_id` key to the aforementioned `metadata` JSON field in our Django objects. This allows you to reference this object with `firebase_id` later should the need arise. I also check for the existence of `some_key` and, if found, it's added to the `metadata` field. This will allow us to access this data in the future, perhaps also adding it as `some_column` in the future for the Django objects. If the migration is a multi-week process and you want to be as close to the current Firebase production collections as possible, simply run those scripts as cron jobs on a daily basis. Their failures will point to problems that you have not addressed in the serialization of Firebase objects to your Django database. In the course of helping our client migrate to Django, we did this process several times to ensure that every object in Firebase could move to Django. Each time we ran the script and it failed, we examined the object that raised the exception thoroughly and added the necessary logic to load it. Finally, there are use cases for which I do recommend using Firebase, since replicating the same functionality in Django would require a more in-depth engineering effort. Examples include real time apps such as whiteboards, and [push notifications](https://github.com/xtrinch/fcm-django). Still, even in those cases, Firebase can be used for part of your application, rather than as the main tool for the whole thing. Thanks to Juan Diego Caballero for his valuable contributions to this article. ### Further Reading A common problem called the N+1 select problem in ORMs like Django’s: https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping A new open-source project called Supabase that offers real time subscriptions on Postgresql databases, thus mimicking a core feature of Firebase: https://github.com/supabase A tool to convert Firebase databases (Firestores) to Postgresql-based GraphQL backends: https://github.com/hasura/firebase2graphql --- <center> <img src="https://monadical.com/static/logo-black.png" style="height: 80px"/><br/> Monadical.com | Full-Stack Consultancy *We build software that outlasts us* </center>





Back to top



Some other posts: