1

Seeking guidance on vectorized solutions to create Django model objects from a pandas DataFrame with one per row.

I've looked for similar questions and associated answers.

I found https://stackoverflow.com/a/34431482/2193381 but don't want to hardcode the database URL, etc. and am looking for other solutions.

The best I can come up with still uses .apply and looks like:

def qux(row):
    return MyDjangoModel(
        foo=row['foo'],
        bar=row['bar']
    )

data['obj'] = data.apply(qux, axis=1)

MyDjangoModel.objects.bulk_create(
    list(data['obj']),
    ignore_conflicts=True
)

Is there a better way?

Vishal
  • 2,097
  • 6
  • 27
  • 45

1 Answers1

2

I timed the above and one other answer and am sharing the results:

num of rows: 31,940

pd.read_csv(): 6.757 seconds
.apply(qux)(): 1.783 seconds 
bulk_create(): 3.560 seconds

The better alternative is probably:

rawlist = data.to_dict('records')

objlist = [MyDjangoModel(foo=row['foo'], bar=row['bar']) for row in rawlist]

MyDjangoModel.objects.bulk_create(objlist, ignore_conflicts=True)

The list comprehension took less time, at 1.011 seconds.

Vishal
  • 2,097
  • 6
  • 27
  • 45