February 26, 2016

Migrating a Django ManyToManyField to use a new 'through' Model

Migrating  a Django ManyToManyField to use a new 'through' Model

A word of caution:

As I write this I think... "I really hope there isn't some easier way that I just couldn't find".

Background

Django has a nifty auto-managed models.ManyToManyField table that does everything automatically for you. This includes ids, uniqueness, and probably other things. You never have to think about it. This is quite handy but also makes things a little trickier when you want to add some additional data to that table at a later date.

Recently I had exactly that issue, and not re-using this join table would have meant making another one exactly like it and maintaining the data twice. Not ideal.

Read about custom join tables in Django's docs. There are some drawbacks to be aware of, but I think its straightforward enough.

An overview of the 4 migrations needed to make this happen:

  1. Create the new Model (to use as the new join 'through')
  2. Migrate the data from the old table to the new one
  3. Drop the old many to many from your model
  4. Add the new ManyToMany field back, with the same name

Simplified Base Models

We are modeling an app called university, which has the concept of a Building which can have many Departments. Of course, one Department can be in multiple Buildings as well.

class Building(models.Model):
    name = CharField(max_length=50)
    departments = ManyToManyField(Product)

class Department(models.Model):
    name = CharField(max_length=50)
    budget = IntegerField()

This results in a database table joining the two Models that looks like this (not real sql):

CREATE TABLE university_building_department (
    id             int not null primary key,
    building_id    int not null fk('university_building_id'),
    department_id  int not null fk('university_department_id')
) WITH CONSTRAINT (
     unique over ['building_id', 'department_id']
)

1. Creating the new model

For the new model we are going to add some data to the university_building_department so we can track who added it. You would have more interesting data!

class BuildingDepartment(models.Model):
    building   = ForeignKey(Building, on_delete=models.CASCADE)
    department = ForeignKey(Department, on_delete=models.CASCADE)
    added_by   = CharField(max_length=50, default='unknown')
 
    class Meta:
        # i like to specify my m-2-m join tables this way, with _x_
        # note, I kept the app name.  not required, but consistent
        table_name = "university_building_x_department"
        unique_together = [
            ['building', 'department']
        ]

Now that we have a new model, we need to add a migration for it. Do that by running:

(venv) [~/myapp] ./manage.py makemigrations
Migrations for 'university':
  0008_add_table_building_department.py:
    - Add model BuildingDepartment
    - Alter unique_together for buildingdepartment (1 constraint(s))

2. Migrating the data from the old table into the new one

I find it easiest to migrate the data using a migrations.RunSQL rather than trying to futz with the schema_editor. If you prefer to go that way, implement this in RunPython. Of course, this is specific to your database, but I think this is pure ANSI SQL from probably before my time and it shouldn't need modification.

(venv) [~/myapp] ./manage.py makemigrations university --empty
Migrations for 'university':
  0009_auto_20160226_0919.py:

(venv) [~/myapp] mv \
    university/migrations/0009_auto_20160226_0919.py \
    university/migrations/0009_migrate_join_table_data.py    

Now that we have the (sensibly renamed) join table migration, open it up and change the contents to look somewhat like this migration. The point is to manually move the data from one table to the other, so we need the default values.

migrate_join_table_data = """
INSERT INTO university_building_x_department (building_id,department_id,added_by)
  SELECT building_id, department_id, 'unknown'
  FROM university_building_department
"""
class Migration(migrations.Migration):
    dependencies = [
        ('university', '0008_add_table_building_department'),
    ]

    operations = [
        migrations.RunSQL(migrate_join_table_data)
    ]

3. Remove the old ManyToManyField

Now we need to drop the old many-to-many field.

  • If we did this first then Django would automatically delete the table we need for our source data in #2.
  • We can't modify an automatic ManyToMany field. If you try you will get an error during makemigrations.

Simply update the Building class by commenting out the M2M field:

class Building(models.Model):
    name = CharField(max_length=50)
    # departments = ManyToManyField(Product)
(venv) [~/myapp] ./manage.py makemigrations
Migrations for 'university':
  0010_remove_building_departments.py:
    - Remove field departments from building

4. Add the many-to-may field back again, with 'to='

class Building(models.Model):
    name = CharField(max_length=50)
    departments = ManyToManyField(through='BuildingDepartment', to=Department)

And make the migrations!

(venv) [~/myapp] ./manage.py makemigrations
Migrations for 'university':
  0011_add_building_departments.py:
    - Add field departments to building
(venv)[~/myapp]: ./manage.py migrate
Operations to perform:
  Synchronize unmigrated apps: django_fsm, corsheaders, ...
Synchronizing apps without migrations:
  ...
Running migrations:
  Rendering model states... DONE
  Applying university.0008_add_table_building_department... OK
  Applying university.0009_migrate_join_table_data... OK
  Applying university.0010_remove_building_departments... OK
  Applying university.0011_add_building_departments... OK

5. Using the new join

Because we are using a custom through table, we can't use the default way of adding new records to the set. We need to do it manually. This means changes when using django rest framework, etc. This makes sense: we now have extra data, and we need to give values for that data.

with transaction.atomic():
    stadium  = Building.objects.create(name="Steve Holt Stadium")
    sports   = Department.objects.create(name="Sports")
    bd = BuildingDepartment.objects.create(
        building   = stadium,
        department = sports,
        added_by   = 'george m.'
    )
Accessing the joined data

Now that we've created it, the standard methods like .departments work as you would expect. However, there is another set added to both sides: buildingdepartment_set.


b = Building.objects.get(name="Steve Holt Hall")

print(b.departments.all())
> [Department("Sports"), ]

print(b.buildingdepartment_set.all())
> [BuildingDepartment("Steve Holt Stadium", "Sports", "george m.")]

This new _set lets us get the actual joined records, where the normal ManyToMany gets the actual related departments. This is quite important, since we definitely want to see the data that we added.