Migrating a Django ManyToManyField to use a new 'through' Model

A word of caution:
As I write this I think... "I really hope there isn't some easier way that I just couldn't find".
Background
Django has a nifty auto-managed models.ManyToManyField
table that does everything automatically for you. This includes ids, uniqueness, and probably other things. You never have to think about it. This is quite handy but also makes things a little trickier when you want to add some additional data to that table at a later date.
Recently I had exactly that issue, and not re-using this join table would have meant making another one exactly like it and maintaining the data twice. Not ideal.
Read about custom join tables in Django's docs. There are some drawbacks to be aware of, but I think its straightforward enough.
An overview of the 4 migrations needed to make this happen:
- Create the new
Model
(to use as the new join 'through') - Migrate the data from the old table to the new one
- Drop the old many to many from your model
- Add the new ManyToMany field back, with the same name
Simplified Base Models
We are modeling an app called university
, which has the concept of a Building
which can have many Departments
. Of course, one Department
can be in multiple Buildings
as well.
class Building(models.Model):
name = CharField(max_length=50)
departments = ManyToManyField(Product)
class Department(models.Model):
name = CharField(max_length=50)
budget = IntegerField()
This results in a database table joining the two Models that looks like this (not real sql):
CREATE TABLE university_building_department (
id int not null primary key,
building_id int not null fk('university_building_id'),
department_id int not null fk('university_department_id')
) WITH CONSTRAINT (
unique over ['building_id', 'department_id']
)
1. Creating the new model
For the new model we are going to add some data to the university_building_department
so we can track who added it. You would have more interesting data!
class BuildingDepartment(models.Model):
building = ForeignKey(Building, on_delete=models.CASCADE)
department = ForeignKey(Department, on_delete=models.CASCADE)
added_by = CharField(max_length=50, default='unknown')
class Meta:
# i like to specify my m-2-m join tables this way, with _x_
# note, I kept the app name. not required, but consistent
table_name = "university_building_x_department"
unique_together = [
['building', 'department']
]
Now that we have a new model, we need to add a migration for it. Do that by running:
(venv) [~/myapp] ./manage.py makemigrations
Migrations for 'university':
0008_add_table_building_department.py:
- Add model BuildingDepartment
- Alter unique_together for buildingdepartment (1 constraint(s))
2. Migrating the data from the old table into the new one
I find it easiest to migrate the data using a migrations.RunSQL
rather than trying to futz with the schema_editor
. If you prefer to go that way, implement this in RunPython
. Of course, this is specific to your database, but I think this is pure ANSI SQL from probably before my time and it shouldn't need modification.
(venv) [~/myapp] ./manage.py makemigrations university --empty
Migrations for 'university':
0009_auto_20160226_0919.py:
(venv) [~/myapp] mv \
university/migrations/0009_auto_20160226_0919.py \
university/migrations/0009_migrate_join_table_data.py
Now that we have the (sensibly renamed) join table migration, open it up and change the contents to look somewhat like this migration. The point is to manually move the data from one table to the other, so we need the default values.
migrate_join_table_data = """
INSERT INTO university_building_x_department (building_id,department_id,added_by)
SELECT building_id, department_id, 'unknown'
FROM university_building_department
"""
class Migration(migrations.Migration):
dependencies = [
('university', '0008_add_table_building_department'),
]
operations = [
migrations.RunSQL(migrate_join_table_data)
]
3. Remove the old ManyToManyField
Now we need to drop the old many-to-many field.
- If we did this first then Django would automatically delete the table we need for our source data in #2.
- We can't modify an automatic
ManyToMany
field. If you try you will get an error duringmakemigrations
.
Simply update the Building class by commenting out the M2M field:
class Building(models.Model):
name = CharField(max_length=50)
# departments = ManyToManyField(Product)
(venv) [~/myapp] ./manage.py makemigrations
Migrations for 'university':
0010_remove_building_departments.py:
- Remove field departments from building
4. Add the many-to-may field back again, with 'to='
class Building(models.Model):
name = CharField(max_length=50)
departments = ManyToManyField(through='BuildingDepartment', to=Department)
And make the migrations!
(venv) [~/myapp] ./manage.py makemigrations
Migrations for 'university':
0011_add_building_departments.py:
- Add field departments to building
(venv)[~/myapp]: ./manage.py migrate
Operations to perform:
Synchronize unmigrated apps: django_fsm, corsheaders, ...
Synchronizing apps without migrations:
...
Running migrations:
Rendering model states... DONE
Applying university.0008_add_table_building_department... OK
Applying university.0009_migrate_join_table_data... OK
Applying university.0010_remove_building_departments... OK
Applying university.0011_add_building_departments... OK
5. Using the new join
Because we are using a custom through
table, we can't use the default way of adding new records to the set. We need to do it manually. This means changes when using django rest framework, etc. This makes sense: we now have extra data, and we need to give values for that data.
with transaction.atomic():
stadium = Building.objects.create(name="Steve Holt Stadium")
sports = Department.objects.create(name="Sports")
bd = BuildingDepartment.objects.create(
building = stadium,
department = sports,
added_by = 'george m.'
)
Accessing the joined data
Now that we've created it, the standard methods like .departments
work as you would expect. However, there is another set
added to both sides: buildingdepartment_set
.
b = Building.objects.get(name="Steve Holt Hall")
print(b.departments.all())
> [Department("Sports"), ]
print(b.buildingdepartment_set.all())
> [BuildingDepartment("Steve Holt Stadium", "Sports", "george m.")]
This new _set
lets us get the actual joined records, where the normal ManyToMany
gets the actual related departments. This is quite important, since we definitely want to see the data that we added.