Database Integrity Principles for Safe Systems

Architecting a Banking Service for Real-Time Gaming at OddSlingers

<center> <img src="https://i.imgur.com/HYm9FeC.png" height="100px"/> # Architecting a Banking Service for <br/> Real-Time Gaming at OddSlingers > Design strategies and common pitfalls that come into play when designing a banking service to handle concurrent transactions in a distributed environment. *Originally published 2018-04-20 on [labs.OddSlingers.com](https://labs.oddslingers.com/posts/Designing-A-Banking-System.html).* </center> Examples in this article talk specifically about the Django app which powers our real-time poker & sidebetting platform at OddSlingers.com, but the concepts should apply equally well to SQLAlchemy, Active Record, and other ORMs and platforms. **Also available as [a talk](https://github.com/pirate/django-concurrency-talk) (given at PyGotham NYC 2018 and PyCon Colombia 2019).** --- ## Intro Our goal today is to design a banking service for an app which can handle a user depositing and withdrawing money, and using that money to cash in and out of poker games. In online poker, users buys chips to put into a "wallet" and use them to play at multiple poker tables simultaneously, occasionally "rebuying" more chips or cashing out for dollars. **Our aim to prevent concurrent requests like game actions or money transfers from ever leaving our data in an inconsistent state.** <center> ![](https://i.imgur.com/49bHDqr.png) </center> The techniques we cover aren't unique to poker & gaming though, data integrity is equally critical for financial, healthcare, any any other sector that handles sensitive information. This post expands on the basics introduced in our much shorter post: [Concurrent-Write Safety in Django](https://invalidpatent.wordpress.com/2016/08/03/two-approaches-to-concurrent-write-safety-in-django/). It's not required reading though as I'll be re-introducing all the same concepts below. ## Money Basics: floating point math is risky Before we get into SQL transactions and atomicity, we need to think about the basics: what type should we use to store dollar amounts? Floats only store an approximation of decimal values. If you don't store them as perfectly as Decimals, any operations on them can introduce accumulating floating-point error. If you think you absolutely need the performance benefit of storing floats then consider it. Until then, why risk it? :::danger ```python >>> 0.1 + 0.2 0.30000000000000004 ``` Floating-point math introduces error that can accumulate. ::: The answer is to either use `Decimal`, or a library like `money` or `currencies`. These all store money internally using perfect precision, either as a string or as an `int` with an exponent to prevent floating point precision loss. :::success ```python >>> from decimal import Decimal >>> Decimal('0.1') + Decimal('0.2') Decimal('0.3') ``` Much better. ::: For maximum confusion, I'm going to use float literals in the examples below, but in a real app you should probably use `Decimal('0.00')` or one of the currency libraries. :::danger Another reason to work with Decimal only: ```python >>> round(1.5) 2 >>> round(2.5) 2 >>> round(3.5) 4 >>> round(4.5) 4 ``` `round()` in Python 3 uses "Bankers Rounding", so don't round dollar amounts. ::: :::success Get expected rounding behavior with: ```python >>> decimal.Decimal('2.5').quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP) Decimal('3') ``` ::: ### Beware of math Any time you're doing multiplication or division on dollar amounts, make sure not to penny-shave by accident. Penny-shaving aka [salami slicing](https://en.wikipedia.org/wiki/Salami_slicing) happens when math operations are rounded down to the nearest cent, leaving to an accumulation of fractional-cents that go unaccounted for in the system. :::danger ```python class Payout(models.Model): ... amt = models.DecimalField(max_digits=20, decimal_places=2) Payout.objects.create(amt=100000/7) # saves: 14285.71 instead of 14285.714285714 ``` Do this 10 million times and suddenly you've lost track of 0.0043 * 10m = $42,857. ::: If you want to know what happens to salami slicers, just check out: - Superman III - Hackers - Office Space - http://www.alphr.com/news/internet/201252/hacker-takes-50-000-a-few-cents-at-a-time <center> <img src="https://i.imgur.com/IARMdBj.jpg" height="300px"/> </center> ## Avoid concurrency if you can An excellent article by Tyler Neely [Fear and Loathing in Lock-Free Programming](https://medium.com/@tylerneely/fear-and-loathing-in-lock-free-programming-7158b1cdd50c) describes various approaches to dealing with concurrency and distributed systems, but the general gist is: **avoid at all costs**. :::danger Failure if two threads execute withdrawals at once: ```python if (user.balance == 100) { user.balance -= 100 # 2nd thread can .save() a 100 withdrawal before us user.save() # balance is now 0 but should be -100 } ``` aka "Time of check to time of use"-bug ([TOCTTOU](https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use)). ::: <center> ![](https://i.imgur.com/lt2htbT.png) *The last write above is stale, as it is based on the outdated assumption that the user still has 100 chips available.* </center> If you can design a system to linearize everything into a single queue instead of processing things in parallel, do it. There's never a good reason to torture yourself with the additional complexity of multiple processes & locks if you don't absolutely need it. Often you can optimize single core performance for a simple task high enough that you never need to run things on multiple cores, redis is a great example of a high-performance single-threaded datastore. ### Designing a linearizable queue for transactions In our wallet app, an approach we could take is to queue all transactions into a list of serialized tuples that can be sent across a network: `[(ts, condition, transaction), ...]`. For example, lets say we have a user whose ID is `241`, and they buy some chips, and then sit down at a poker table. We need to deposit the chips into their wallet, and then take them out in order to give them a spot at the table. We'll need to make sure they have enough chips to sit at the table, so we use the check `balance_gt(user_id, amt)` to confirm they have enough chips. ```python transactions = [ # timestamp condition action (1523518620, "can_deposit(241)" , "deposit_usd(241, 50)"), (1523518634, "balance_gt(241, 50)", "buy_chips(241, 50)"), ] ``` <center> ![](https://i.imgur.com/SPszJni.png) *Stale writes can't happen if only one process is reading & writing at a time.* </center> Assuming each action only modifies one row at a time, you can process the transactions with a simple, single-threaded loop like this: ```python while transactions: ts, condition, action = transactions.pop() if eval(condition): eval(action) # make sure to sanitize any user inputted strings ``` There are some big caveats with this approach though, namely, you can **only ever have a single process touching the transaction table, and it needs to have a lock on the entire table to prevent anyone else from modifying it**. If you ever run a second transaction handling loop or modify the table outside of the loop, you'll run into hellish scenarios where transactions process out of order, or tables are modified between the `if condition` and `do action` portion, creating invalid data. If you think you can get away with storing all your money data in one table, and you don't mind locking the entire table and limiting it's use to a single thread, stop reading here, you're done! > See [appendix 1](#appendix) for a full example of a single-threaded queue processor. With only a single process modifying any locked tables, you'll never run into a scenario where the data you write is stale, or multiple people attempt to modify the same value. You can even have multiple servers submitting transactions to your central queue, as long as the timestamps are accurate, things will be processed in the order received and will never execute at the same time. Syncronizing time between multiple servers is *very hard* though, so don't rely on having strictly ordered transactions unless your servers are synchronized via [atomic clocks](http://www.wired.com/wiredenterprise/2012/11/google-spanner-time/)! ## Designing thread-safe banking transactions ### Single-threaded code doesn't scale There are a few a common scenarios where the above design breaks down, usually because locking the entire table you're going to be writing isn't practical, e.g. what if you need to add a row, and update a different row that's used elsewhere in the system at the same time: ```python BalanceTransfer.objects.create(from=Cashier, to=user, amt=amt) PokerPlayer.objects.filter(user_id=user.id, chips=F('chips') + amt) ``` Suddenly you'll not only need to lock the entire `BalanceTransfer` table, but also the `PokerPlayer` table, which is unacceptable if you're running hundreds of games at the same time which need access to those tables. To do this properly without incurring the massive performance penalty of whole-table locks, we have to introduce a few new concepts: **compare-and-swaps & atomic transactions**. A SQL transaction is basically a way of saying "perform all these operations at once, and if any of them fail, roll-back and don't commit any of them". We use an atomic transaction in the earlier single-threaded example in order to perform the `action` and `delete` together. If the `action` writes fail for any reason, we don't want to `delete` the `QueuedTransaction` because then we'd lose the customers money! `atomic` insures that if any exception is thrown, invalid partial changes are never commited. ### Atomic compare-and-swap operations Atomic transaction alone aren't magic though, we still need to avoid [TOCTTOU](https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use) bugs, which means that the `condition` and `action` actually need to be a single statement to avoid data changing between the read and the write. For example, to subtract 50 chips from a user's balance, we might write: :::danger ```python user = User.objects.get(id=241) if user.balance == 100: user.balance = 50 # meanwhile, in another thread, user adds 100 new chips user.save() # now user's balance is 50 instead of 150!! ``` Leaving any lines between the `if` and the `.save()` allows another process to change the data during that time. Make them a single compare-and-update statement, or lock the row until the `.save()` is over. ::: Suddenly we've made them lose 100 chips when we only wanted to subtract 50! Because the balance can change between line 3 & 5, we will write our stale `user.balance` value instead of the correct fresh value from the db. To fix this we'll actually need to actually perform the entire operation in a single SQL statement: :::success ```python updated = User.objects.filter(id=241, balance=100).update(balance=50) if not updated: raise Exception('Someone else modified balance!') ``` This "compare-and-swap" is a single atomic operation that will only succeed if the balance is sufficient, preventing TOCTTOU bugs. ::: Now we're only updating the user object if the balance is exactly what we expect it to be! If it's different from what we expected, the `.filter()` will return no rows, and the update will fail. The great part is that this is a single, atomic SQL statement, there's no way for the balance to change between the check and the update. This isn't always easy though, especially for operations involving multiple rows, or modifying values that depend on one or more existing values in the row. ### Locking rows when single atomic compare-and-swaps aren't possible For more complex operations, Django provides the `.select_for_update()` queryset method, which locks the row in question, blocking other threads from modifying it before we save our changes. Ony other thread attempting to processes a write will block until the lock is released, keeping our view of the data intact until the transaction is completed. Be careful though, `select_for_update()` only provides a simple write-lock to ensure our read data isn't stale, it doesn't prevent other threads from waiting for our lock to release and write stale data on top of our last commit. This is a particularly gnarly edge case that is often neglected, leading to subtle bugs if write-transactions don't use compare-and-swaps to make sure writes commit only when the state is what they expected it to be, :::success ```python revenue = 0 with transaction.atomic(): sold = Widget.objects.filter(sold__gt=2018).select_for_update() # lock yr_revenue = sum(calc_revenue(widget) for widget in sold) report = YearlyReport.objects.create(ts=timezone.now(), revenue=yr_revenue) if revenue < 100_000: raise CompanyFailed('Get a better buisness model.') ``` For batch processing where many rows need to stay consistent with respect to eachother, lock all dependent rows until the transaction is complete. ::: ## Store transaction history, just the total, or both? When designing a banking system, you might start out by just storing a log of their money-in & money-out Transfers, without keeping a separate field to store the total on the User. It's clean because the total balance can be derived by adding the transactions up, and there's no duplication of data, what's not to like? ```python class Transfer(models.Model): src = models.ForeignKey(User) dst = models.ForeignKey(User) amt = models.DecimalField(max_digits=20, decimal_places=2) timestamp = models.DateTimeField(auto_now=True) def user_balance(user): """adds up all credits & debits to see user's final balance""" credits = Transfer.objects.filter(dst=user).aggregate(Sum('amt'))['amt__sum'] or 0 debits = Transfer.objects.filter(src=user).aggregate(Sum('amt'))['amt__sum'] or 0 return credits - debits ``` <center> <img src="https://i.imgur.com/J645nHa.png" height="200px"> *Transfers are append-only, user balance is derived by adding up all the history rows.* </center> As discussed in this great talk [Turning the Database Inside Out](https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/), viewing the data as an append-only immutable log of Transfers makes maintaining integrity much simpler. If tables like Transfers are append-only, writes simply need to be strictly ordered by timestamp, which reduces the concurrency problem to a time-synchronization (something Google solved by using atomic clocks for [Spanner](https://en.wikipedia.org/wiki/Spanner_(database))). ### Immutable, append-only tables are hard to lock in SQL though The problem is, any process that wants to withdraw from the User's balance now needs to block *all* new row writes to the `Transfer` table to prevent adding simultaneous withdrawals, effectively grinding your banking system to a halt for every transaction. A good example is if the user only has $50 in their wallet, and you want to remove all $50, you'll need to prevent any other withdrawal transactions from being added simultanously in order to prevent double-spending more than the $50 available. A solution is to store a separate row that keeps track of the total along with the append-only transaction history. A single row-level lock is then all that's needed to prevent other changes to the user's balance, any other transactions will need to wait for the lock to release in order to modify balance. ```python class UserBalance(models.model): user = models.OneToOneField(User) total = models.DecimalField(max_digits=20, decimal_places=2) def buy_chips(user, amt): with transaction.atomic(): # Lock user row, preventing anyone else from changing total UserBalance.objects.select_for_update().filter(id=user.id) # Update the user's total and add a Transfer row atomically UserBalance.objects.filter(id=user.id).update(total=F('total') + amt) Transfer.objects.create(src=Cashier, dst=user, amt=amt) # Exiting context handler commits transaction and releases User row lock ``` This effectively prevents anyone from modifying the user total, without needing to have an entire table lock on `Transfer`s. ## Multiple Databases A useful tactic is to store all financial data in a separate database entirely. This also allows you to point multiple app servers running different versions of your code & schema at one central banking service. Depending on your scale, separate databases might even be legally manded to comply with regulations about PII financial data risiding in the same country as its users. Luckily, Django kicks ass, and supports nested multi-db transactions with no fuss at all. ```python try: with transaction.atomic(using='default'): with transaction.atomic(using='db_one'): with transaction.atomic(using='db_two'): MyModel_one(...).save(using='default') MyModel_two(...).save(using='db_one') MyModel_three(...).save(using='db_two') # raises exception except IntegrityError: ... # all 3 dbs roll back and dont commit the changes ``` https://stackoverflow.com/questions/31928143/django-transaction-using-multi-db ## Coding techniques We found that a good design habit was to keep all the sensitive write transactions not mixed into view-layer logic and read transactions. This makes them easy to audit and change as a group, you'll thank yourself next time you need to refactor them or fix a bug in an emergency. E.g. `app_name/transactions.py` All financial db writes go here. ```python def transfer_money(src, dst, amt): with transaction.atomic(): ... def merge_accounts(user_a, user_b): with transaction.atomic(): ... def archive_account(user): with transaction.atomic(): ... ``` ```python from banking.transactions import transfer_money ``` Have your transactions peer reviewed by a distributed systems expert, and stress test your systems concurrency guarantees using tests like the Jepsen suite. Your revenue and customer secuirty depend on your data integrity practices, don't leave this stuff up to chance or it will come back to bite you. Log judiciously and redundantly, in a format that's easily parsable if the whole system burns down. Should you need to recompute everyone's balances from transaction logs, you will thank yourself a million times for having the forsight to back everything up. ## Summary A single-threaded app can get away with processing transactions using a single odered queue. As long as the queue is strictly ordered by time, and there's a whole-table lock to prevent modification by other threads, everything should work. However, most real-world apps run on multiple web servers, so writing thread-safe code is absolutely essential. It's very difficult to rewrite non thread-safe code to be thread-safe later, so good initial design is important. In most real-world setups, you'll need to modify multiple tables at once, and obtaining a whole-table lock isn't practical, so it is wise to use atomic writes and transactions to perform multiple writes together, and roll back all of them if any write fails. :::danger **Don't use floats.** **Don't use round(), if you must, always account for the remainder.** **Don't execute non thread-safe writes in a parallel environment.** ::: :::success **Use `Decimal` instead of `float`, and `Decimal.quantize()` instead of `round()`:** ```python Decimal('0.35') + Decimal('100.15') ``` **Lock dependent rows during transactions:** ```python with transaction.atomic(): players = Player.objects.filter(user=user).select_for_update() user.balance = player_balance_sum(players) user.save() ``` **Use atomic compare-and-swap operations when you cant lock:** ```python User.objects.filter(id=user.id, balance__gt=50)\ .update(balance=F('balance) - 50)` ``` ::: :::success **Span transactions across multiple databases, keep financial db separate:** ```python with transaction.atomic(using='default'): with transaction.atomic(using='db_two'): MyModel_one(...).save(using='default') MyModel_two(...).save(using='db_two') ``` **Log everything, back up your data remotely, and have multiple failovers.** ```bash pg_dump db_name > /tmp/db_name.sql zip -r /tmp/db_name.sql.zip /tmp/db_name.sql rsync -avz /tmp/db_name.sql.zip backups:/backups/db_name_$(date +%s).sql.zip ``` **Set your database isolation level to "Serializable"** (only if you need it). ```python DATABASES = {..., 'OPTIONS': {'isolation_level': psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE, }} ``` ::: ## Defense in Depth **Warning: this section may contain overkill**, evaluate on your own and decide what level of risk you're comfortable with. Application logic and database integrity is only one piece of the puzzle, to approach high-scale enterprise-grade integrity all the way down the stack, you'll have to ensure you have a check-summed filesystem, and ECC RAM. Even those don't provide 100% guarantee that things don't go wrong, but security is defense in layers. From [random bit flips](https://twitter.com/whitequark/status/980526967630921728) to poorly concurrent threads, there are many dangers to protect against, and you'll have to think about exactly what risk tradeoffs you're willing to take in your design. ![](https://i.imgur.com/VOgFFlf.jpg) ## Database Settings: Isolation levels ![](https://i.imgur.com/ZvAV3W7.png) Because implementing databases to be fast **and** correct is *hard*, designers let users chose what level of safety vs speed they want. The [ANSI Isolation Levels](https://en.wikipedia.org/wiki/Isolation_(database_systems)) supported by SQL databases go from "Read Uncommited" (the lowest) to "Serializable" (the highest). The highest level "Serializable", means you'll never read stale data in a transaction. You'll see a snapshot of the world at the moment your transaction started, even if other transactions are modifying the same rows concurrenly. The lowest level is the opposite, allowing partially complete, uncommited transactions to contaminate others, causing what's known as "dirty reads". <center> ![](https://i.imgur.com/bNoP65F.png) </center> If you're handling money using only atomic transactions & writes with no locking, you can only safely use a database that supports "Serializable" writes & reads, otherwise you don't have a 100% guarantee that you're not reading or writing stale data. ```python import psycopg2.extensions DATABASES = { # ... 'OPTIONS': { 'isolation_level': psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE, }, } ``` https://docs.djangoproject.com/en/1.8/ref/databases/#isolation-level Of course, this comes with a significant performance penalty, which is why databases don't ship with SERIALIZABLE on by default. To mitigate the performance impact, you can keep financial data in a separate database, with a higher isolation level than less sensitive data. **It's often the case that you don't actually need 'serializable' if your locking system is designed right.** Take a look at the [Jepsen tests](https://jepsen.io/analyses) to learn more about the isolation guarantees provided by different traditional and distributed SQL databases. ## Further Reading - [ACID Guarantees](https://en.wikipedia.org/wiki/ACID) wiki page - [Transaction Isolation](https://en.wikipedia.org/wiki/Isolation_(database_systems)) wiki page - [Django docs on isolation levels](https://docs.djangoproject.com/en/1.8/ref/databases/#isolation-level) - [Raft](http://thesecretlivesofdata.com/raft/) consensus algorithm - [CockroachDB](https://www.cockroachlabs.com/), [TiDB](https://github.com/pingcap/tidb), and [Spanner](https://en.wikipedia.org/wiki/Spanner_(database)) distributed SQL databases - [Jepsen Integrity Analyses](https://jepsen.io/analyses) of varoius databases - [Fear and loathing in lock-free programming](https://medium.com/@tylerneely/fear-and-loathing-in-lock-free-programming-7158b1cdd50c) by Tyler Neely - [Turning the database inside-out](https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/) by Martin Kleppmann - [ZFS](http://open-zfs.org/wiki/Main_Page) and [BTRFS](https://en.wikipedia.org/wiki/Btrfs) checksummed filesystems - [Error-Correcting Code RAM](https://en.wikipedia.org/wiki/ECC_memory) ## Check out OddSlingers If you made it this far, you deserve to see it all in action! <center> ![](https://i.imgur.com/RmqA2g5.jpg) <img src="https://i.imgur.com/eC8alr1.png" height="50px"/>   <a href="https://oddslingers.com/accounts/login/?utm_source=blog" class="btn btn-success btn-lg">Play Poker on OddSlingers</a>   <img src="https://i.imgur.com/eC8alr1.png" height="50px"/> If you sign up using this link, you'll get 5000 free chips to play poker with on OddSlingers. </center> ## Appendix 1. **Single-threaded transaction queue processor with entire-table locking:** ```python def put_transaction(condition, action): Txn.objects.create(condition=condition, action=action) def pop_transaction(): txns = Txn.objects.filter('pending').order_by('created') while not txns.exists(): sleep(0.1) return txns.first() def banking_runloop(): while True: txn = pop_transaction() with connection.cursor() as cursor: cursor.execute("LOCK TABLES %s READ", [...]) # tables to lock try: with transaction.atomic(using="default"): if eval(txn.condition): eval(txn.action) Txn.objects.get(id=txn.id).update(status='succeeded') else: raise Exception('Condition was false') except Exception: Txn.objects.get(id=txn.id).update(status='failed') finally: cursor.execute("UNLOCK TABLES;") ``` <br/> ## Further Reading ### Data Models - https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying - https://substack.wasteman.codes/p/engineering-principles-and-best-practices - https://www.etsy.com/codeascraft/the-problem-with-timeseries-data-in-machine-learning-feature-systems - https://monadical.com/posts/from-firebase-to-django.html - https://milestone.topics.it/2023/01/19/pitfalls-of-request-response-over-messaging.html - https://brandur.org/soft-deletion - https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html - https://blog.ploeh.dk/2024/06/03/youll-regret-using-natural-keys/ ### Distributed Systems - https://architecturenotes.co/p/fallacies-of-distributed-systems - https://muratbuffalo.blogspot.com/2021/02/foundational-distributed-systems-papers.html - https://ferd.ca/a-distributed-systems-reading-list.html - https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ - https://www.figma.com/blog/how-figma-scaled-to-multiple-databases/ - https://blog.jim-nielsen.com/2023/offline-is-online-with-extreme-latency/ - https://www.exhypothesi.com/clocks-and-causality/ - https://docs.monadical.com/s/demystifying-databases - https://planetscale.com/blog/what-is-database-sharding-and-how-does-it-work - https://www.cockroachlabs.com/blog/raft-is-so-fetch/ - https://dbmsmusings.blogspot.com/2019/07/overview-of-consistency-levels-in.html - http://webdam.inria.fr/Alice/ - https://jepsen.io/analyses - https://github.blog/2021-04-05-how-we-scaled-github-api-sharded-replicated-rate-limiter-redis/ - https://www.figma.com/blog/making-multiplayer-more-reliable/ - https://www.binwang.me/2020-11-02-Use-TLA+-to-Verify-Cache-Consistency.html ### SQLite - https://github.com/rqlite/rqlite - https://www.sqlite.org/lockingv3.html - https://www.sqlite.org/np1queryprob.html ### Postgres - https://supabase.com/blog/choosing-a-postgres-primary-key - https://www.infoq.com/articles/wonders-of-postgres-logical-decoding-messages/ - https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/ - https://gds.blog.gov.uk/2024/01/17/how-we-migrated-our-postgresql-database-with-11-seconds-downtime/ - https://www.cybertec-postgresql.com/en/column-order-in-postgresql-does-matter/ - https://mastermind.dev/articles/indexes-in-postgresql - https://www.dylanpaulus.com/posts/postgres-is-a-graph-database/ ### File Systems - https://puzpuzpuz.dev/the-secret-life-of-fsync - https://lwn.net/Articles/752063/ - https://danluu.com/file-consistency/ - https://github.com/ligurio/unreliablefs - https://bun.uptrace.dev/postgres/tuning-zfs-aws-ebs.html

Nick Sweeting

is a Co-Founder of Monadical

Recent posts

Let's transform your organization.