portrait picture

TIMO ZIMMERMANN

balancing software engineering & infosec

Managing state in Django models

posted on Friday 20th of March 2020 in

A Django model often will contain some form of state. And there will most likely be events which modify the state. While those two things are not always called like this, the concept still exists. More often than not you will find a view controller assigning the new state to an instance of a model when certain conditions are met. In some cases you will see the event code being abstracted to make it easier to test, maybe in a separate function or class, maybe to a model method. And more often than you should you will see this messing up the state of an instance forcing someone to wake up at 2am to connect to the database – or Django admin if setup – and set a valid state. Thankfully the chance for the last thing to happen can be greatly reduced.

Over the years I found that not everyone is familiar with the concept of a finite-state machine (which would help addressing this problem). I have also met people who had in depth knowledge of state machines in all their forms and would have been able to implement one as a stand-alone system, but had trouble integrating one with a Django model. In this article we will focus on the practical application of having centralised logic to control state and events transitioning the state of an object.

This article assumes you have basic knowledge of Django and Python and are familiar with the way Django’s models work.
You can find the code used in this demo here.

Traffic Lights

Let us start with the most simplified example I can think of – traffic lights. A traffic light has three different states: red, yellow and green. There are three possible transitions between those states:

The order in which those transitions happen is – during regular operation – always the same. While traffic lights are actually a lot harder in practice, we simply assume we have have a script changing the state in regular intervals. To do this the script fetches the traffic light entry from the database and calls a transition method. Since there is only one possible state to transition we do not need to handle multiple events.

# coding: utf-8
from django.db import models


STATE_RED = "stop"
STATE_YELLOW = "caution"
STATE_GREEN = "gogogo"

STATE_CHOICES = (
    (STATE_RED, STATE_RED),
    (STATE_YELLOW, STATE_YELLOW),
    (STATE_GREEN, STATE_GREEN),
)

TRANSITIONS = {
    STATE_RED: STATE_GREEN,
    STATE_YELLOW: STATE_RED,
    STATE_GREEN: STATE_YELLOW,
}


class TrafficLight(models.Model):
    """to change state of the traffic light call `TrafficLight().transition()`
    """

    # keep state when initalising the model to avoid additional DB lookups
    __current_state = None

    state = models.CharField(
        max_length=20, choices=STATE_CHOICES, default=STATE_RED
    )

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__current_state = self.state

    def save(
        self,
        force_insert=False,
        force_update=False,
        using=None,
        update_fields=None,
    ):
        allowed_next = TRANSITIONS[self.__current_state]

        # skip validation if the model is being created
        updated = self.state != self.__current_state
        if self.pk and updated and allowed_next != self.state:
            raise Exception("Invalid transition.", self.state, allowed_next)

        # manually set __current_state to ensure instances can be used mutliple
        # times without running into validation errors
        if self.pk and updated:
            self.__current_state = allowed_next

        return super().save(
            force_insert=force_insert,
            force_update=force_update,
            using=using,
            update_fields=update_fields,
        )

    def transition(self):
        next_state = TRANSITIONS[self.state]
        self.state = next_state
        self.save()

There are a few things going on here and, I would assume, one thing that might look a bit different than what some might expect from a regular Django model. If you have never overwritten a models save() method I would advise reading up on the various arguments and what they mean. They might be useful some day.

First of all we overwrite __init__() to keep the current state of the traffic light around. By doing so we can avoid an additional database call in save() where otherwise we would have to fetch the instance we are operating on to compare the new value to the existing one.

Next, in save() we actually ensure that a transition from the current state to the new state is valid.

We update __current_state so next time we call transition() the instance actually knows about the current state of the traffic light without refreshing from the database.

Using a shell session we can validate everything is working as expected.

>>> from traffic.models import TrafficLight
>>> t = TrafficLight.objects.create()
>>> t.state
'stop'
>>> t.transition()
>>> t.state
'gogogo'
>>> t.transition()
>>> t.state
'caution'
>>> t.transition()
>>> t.state
'stop'
>>> t2 = TrafficLight.objects.get(pk=t.pk)
>>> t2.state
'stop'
>>>

But what happens when we manually try to set an invalid state?

>>> from traffic.models import TrafficLight, STATE_RED
>>> t = TrafficLight.objects.last()
>>> t.state
'gogogo'
>>> t.state = STATE_RED
>>> t.save()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File ".../django-state-machine/traffic/models.py", line 50, in save
    raise Exception("Invalid transition.", self.state, allowed_next)
Exception: ('Invalid transition.', 'stop', 'caution')
>>>

Our save() method properly throws an exception. Perfect, so we guarded against the most common ways to set an illegal state. Someone can obviously still run raw SQL queries via django.db.connection, but once we take this into consideration we would also have to account for direct connections to the database for example, which is out of scope for this article.

Airport Pickup

Now that we have seen some a basic, functioning example let us dive into a slightly more complex one.

Let us write a small piece of software for a hotel which offers an airport shuttle service for their VIP customers. A standard pickup of a customer arriving at an airport would look like this:

Now there are obviously a few things which might need a different transition than the ones outlined in the flow above – like a customer who needs to be transported to the airport or a driver declining a request at any stage before picking up a customer which then needs to be re-assigned. We will not cover all possible scenarios, just enough to demonstrate the concept of a model reacting to events which determine the new state.

# coding: utf-8
from django.db import models


STATE_REQUEST = "request"
STATE_WAITING = "waiting"
STATE_TO_AIRPORT = "to_airport"
STATE_TO_HOTEL = "to_hotel"
STATE_DROPPED_OFF = "dropped_off"

STATE_CHOICES = (
    (STATE_REQUEST, STATE_REQUEST),
    (STATE_WAITING, STATE_WAITING),
    (STATE_TO_AIRPORT, STATE_TO_AIRPORT),
    (STATE_TO_HOTEL, STATE_TO_HOTEL),
    (STATE_DROPPED_OFF, STATE_DROPPED_OFF),
)

TRANSITIONS = {
    STATE_REQUEST: [STATE_WAITING,],
    STATE_WAITING: [STATE_REQUEST, STATE_TO_AIRPORT],
    STATE_TO_AIRPORT: [STATE_TO_HOTEL, STATE_REQUEST],
    STATE_TO_HOTEL: [STATE_DROPPED_OFF],
    STATE_DROPPED_OFF: [],
}


class Pickup(models.Model):
    __current_state = None

    state = models.CharField(
        max_length=20, choices=STATE_CHOICES, default=STATE_REQUEST
    )

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__current_state = self.state

    def save(
        self,
        force_insert=False,
        force_update=False,
        using=None,
        update_fields=None,
    ):
        allowed_next = TRANSITIONS[self.__current_state]

        updated = self.state != self.__current_state
        if self.pk and updated and self.state not in allowed_next:
            raise Exception("Invalid transition.", self.state, allowed_next)

        if self.pk and updated:
            self.__current_state = self.state

        return super().save(
            force_insert=force_insert,
            force_update=force_update,
            using=using,
            update_fields=update_fields,
        )

    def _transition(self, state):
        self.state = state
        self.save()

    def assign(self, driver):
        # we omit storing the driver on the model for simplicity of the example
        self._transition(STATE_WAITING)

    def accept(self):
        self._transition(STATE_TO_AIRPORT)

    def decline(self):
        self._transition(STATE_REQUEST)

    def picked_up(self):
        self._transition(STATE_TO_HOTEL)

    def dropped_off(self):
        self._transition(STATE_DROPPED_OFF)

Instead of calling transition() we call an event like picked_up() or decline(). While the methods are only calling the _transition() method and passing in the state to transition to, in a real application we would likely trigger additional functionality like sending an SMS to let a driver know a request was assigned to them or letting the hotel staff know that a customer was just dropped off so they can start the sign in procedure.

A short test run demonstrates that the transitions work as expected.

>>> from pickup.models import Pickup
>>> p = Pickup.objects.create()
>>> p.state
'request'
>>> p.assign("driver1")
>>> p.state
'waiting'
>>> p.decline()
>>> p.state
'request'
>>> p.assign("driver2")
>>> p.state
'waiting'
>>> p.accept()
>>> p.state
'to_airport'
>>> p.picked_up()
>>> p.state
'to_hotel'
>>> p.dropped_off()
>>> p.state
'dropped_off'
>>>

State assignments

You might now rightfully argue that things can still go wrong, especially if additional functionality should be triggered as part of an event. What if someone sets a pickup to dropped_off which was in the to_hotel state? It is a valid transition, but hotel staff would never be notified.

You can obviously make this part of your code reviews, but as your codebase and team grows this practice will fail you at some point. Not specifically for assigning state, but for basically everything. People miss things, that is just a matter of fact.

If you reached the point at which you want to add additional safe guards you can always extend your model with more validators or add linter rules. One approach might be checking the caller of the _transition() method ensuring they are whitelisted.

import inspect


class Foo:
    def transition(self):
        curframe = inspect.currentframe()
        calframe = inspect.getouterframes(curframe, 2)

        if calframe[1][3] != "caller1":
            raise Exception("nope")

    def caller1(self):
        self.transition()


def caller2(foo):
    foo.transition()


f = Foo()
print("caller1")
f.caller1()
print("caller2")
caller2(f)

Calling transition() from caller1() is allowed while caller2() throws an exception.

>>> import test
caller1
caller2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../django-state-machine/pickup/test.py", line 24, in <module>
    caller2(f)
  File ".../django-state-machine/pickup/test.py", line 17, in caller2
    foo.transition()
  File ".../django-state-machine/pickup/test.py", line 10, in transition
    raise Exception("nope")
Exception: nope

While many Django devs prefer fat models I would argue that with enough additional validation you will blow up the size of the model significantly and should externalise the whole state transitions and validation code and pass your model in. This usually results in a design which is easier to test and reason about.

Alternatives approaches and pitfalls

An alternative approach would be a pre_save() receiver instead of overwriting save(). While signals sound like a good idea, in practice they have shown to simply be a burden during the maintenance phase. This does not mean there is no time or place to use signals, but they are far rarer than some codebases make you believe.

You want to be very careful how you trigger events. For more complex examples relying on the current state of an instance you usually do not want to call events and transitions in parallel, especially when the outcome of a transition is communicated back to a client which changes state based on the outcome of the transition. Some form of locking or a FIFO queue might be sufficient to solve this problem sufficiently well, but at the end of the day it really depends on your specific use case.

Libraries

I want to mention two libraries which have proven to be quite good and might come in handy if you do not have a Django model to work with, do not want to build everything yourself or simply want to study possible implementations – transitions and state_machine.

Conclusion

There is some overhead involved in building proper state management into your model and it might require some comments in code or a small document outlining how exactly it works (so new hires can catch up with the design of the system). The examples also might seem a bit dull and solvable with less code, but past experience shows that as you scale your application and solve harder problems than a traffic light’s proper state management is something worth building. Especially in business critical paths of your code where a wrong state might result in customers receiving a sub-par service, you losing money or worst case the system ending up in a state which cannot be recovered and blocks users from interacting with it.