1

I am working on a Django project, where I need to save data from multiple pandas DataFrames into Django models. I also use MySQL to store the data.

However, one of the DataFrames - df, is equivalent to the Django model/database table, where I want to store the data. So my sql-table is a one to one copy of df. df is also the largest dataframe that I have.

I was thinking, if it would make sense to simply apply df.to_sql instead of saving the data through the Django model, where I have to iterate over the rows of df. I already tried it and it seems to work fine (and also fast). However, I am not sure if such an approach is good from Django's perspective, because I am saving the rest of the data through models.

I would really appreciate any help or suggestions!

nova
  • 171
  • 2
  • 9
  • 1
    This should be fine. You will lose any Django hooks/signals if you have them on the model though. – reptilicus Dec 12 '16 at 23:41
  • @reptilicus is correct. If you have some middleware that tracks changes to models to allow for revisions, or have additional logic that extends a models save method, you're possibly bypassing it using pandas' to_sql. The thing that comes to mind as something that's most likely to cause problems is if you have any fields with choices limited to a list. By saving directly into the database, you're likely bypassing the logic. – DataSwede Dec 12 '16 at 23:46
  • Not what you asked, but if you don't use the model in way other than loading it into a DataFrame, then storing the DataFrame's records in database records (i.e. model objects) is suboptimal, and storing the DataFrame in a FileField (with `to_csv` or similar) could be way more efficient. See http://stackoverflow.com/questions/25212009/django-postgres-large-time-series – Antonis Christofides Dec 13 '16 at 11:09

0 Answers0