1

I have the need to create many EVENT objects at once and subsequently create many ARCHIVED_EVENT objects that have a foreign key to their corresponding event.

My code looks like something like this:

events = []
archivedEvents = []
for _ in range(1000):
    event = Event(name="Test")
    archivedEvent = ArchivedEvent(event_id=event.id)
    archivedEvents.append(archivedEvent)
    events.append(event)

Event.objects.bulk_create(events)
ArchivedEvent.objects.bulk_create(archivedEvents)

Unfortunately all Archived Events created here have a NULL foreign key to an EVENT. I understand that the primary key for an object is not generated until it is saved to the database. But I save the events before creating the archived events. Am I missing something? Should I refresh the cache before bulk creating the archived events?

Andrew
  • 8,322
  • 2
  • 47
  • 70
Rage
  • 870
  • 9
  • 27

2 Answers2

0

event.id won't be created unless it does through bulk_create(). Probably you can use another loop to create ArchivedEvent. Like this:

events = []
archivedEvents = []

for _ in range(1000):
    event = Event(name="Test")
    events.append(event)
events = Event.objects.bulk_create(events)

for event in events:
    archivedEvent = ArchivedEvent(event_id=event.id)
    archivedEvents.append(archivedEvent)
ArchivedEvent.objects.bulk_create(archivedEvents)

This solution works for Postgresql as per documentation. If you have other databases, then try like this:

events = []
archivedEvents = []

for _ in range(1000):
    event = Event(name="Test")
    events.append(event)
Event.objects.bulk_create(events)

events = Event.objects.all().order_by('-id')[:1000][::-1]
for event in events:
    archivedEvent = ArchivedEvent(event_id=event.id)
    archivedEvents.append(archivedEvent)
ArchivedEvent.objects.bulk_create(archivedEvents)

To avoid race conditions, you can pre-generate primary keys, then use them. For example:

last_event_id = Event.objects.last().id + 1000 # getting last event id and adding thousand to avoid duplicates.

events = [x+last_event_id for x in range(1000)]
archivedEvents = []

for e in events:
    event = Event(name="Test",pk=e)
    events.append(event)
    archivedEvent = ArchivedEvent(event_id=e)
    archivedEvents.append(archivedEvent)

Event.objects.bulk_create(events)
ArchivedEvent.objects.bulk_create(archivedEvents)
ruddra
  • 50,746
  • 7
  • 78
  • 101
0

Here is a solution to avoid race condition with db's lock based on @ruddra's other database solution. Hope this can help you.

from django.db import transaction

with transaction.atomic():

   # lock first row to avoid race condition
   # note: first row of Event table must have content,
   #       if not, you need use other tables's not empty row to add lock.
   Event.objects.select_for_update().first()

   # just @ruddra's mysql solution
   events = []
   archivedEvents = []

   for _ in range(1000):
       event = Event(name="Test")
       events.append(event)
   Event.objects.bulk_create(events)

   event_ids = Event.objects.values_list('id', flat=True).order_by('-id')[:1000][::-1]
   for event_id in event_ids:
       archivedEvent = ArchivedEvent(event_id=event_id)
       archivedEvents.append(archivedEvent)
   ArchivedEvent.objects.bulk_create(archivedEvents)
Fogmoon
  • 569
  • 5
  • 16
  • What exactly is happening here? Why are we locking the first row? – Rage Jun 15 '20 at 08:42
  • It's just a lock provided by db to avoid race condition, you can use any other table row. Usually, I use first row as the lock. See [select-for-update](https://docs.djangoproject.com/en/3.0/ref/models/querysets/#select-for-update) – Fogmoon Jun 15 '20 at 08:48
  • Hmm, perhaps this lock technique would take more time than single saves() as this view could now run simultaneously but not really as some threads would need to wait. – Rage Jun 15 '20 at 09:13
  • I don't think so. Eventually all data need to wait lock at db level when insert. See [Which is faster: multiple single INSERTs or one multiple-row INSERT?](https://stackoverflow.com/questions/1793169/which-is-faster-multiple-single-inserts-or-one-multiple-row-insert), although the view now run simultaneously, it can handle the request more fast. Anyway, for the time cost, you'd better have a test. – Fogmoon Jun 15 '20 at 11:59