19

I have stock price data that is stored in a pandas DataFrame as shown below (actually it was in a panel, but I converted it to a DataFrame)

        date  ticker  close       tsr
0 2013-03-28  abc     22.81  1.000439
1 2013-03-28  def     94.21  1.006947
2 2013-03-28  ghi     95.84  1.014180
3 2013-03-28  jkl     31.80  1.000000
4 2013-03-28  mno     32.10  1.003125
...many more rows

I want to save this in a Django model, which looks like this (matches the column names):

class HistoricalPrices(models.Model):
    ticker = models.CharField(max_length=10)
    date = models.DateField()
    tsr = models.DecimalField()
    close = models.DecimalField()

The best I've come up so far is using this to save it, where df is my DataFrame:

entries = []
for e in df.T.to_dict().values():
    entries.append(HistoricalPrices(**e))
HistoricalPrices.objects.bulk_create(entries)

Is there a better way to save this?

I've looked at django-pandas, but looks like it just reads from the DB.

Stefan
  • 41,759
  • 13
  • 76
  • 81
Kritz
  • 7,099
  • 12
  • 43
  • 73

2 Answers2

27

It would be most efficient to use to_sql() with appropriate connection parameters for the engine, and run this inside your Django app rather than iterating through the DataFrame and saving one model instance at a time:

from sqlalchemy import create_engine
from django.conf import settings

user = settings.DATABASES['default']['USER']
password = settings.DATABASES['default']['PASSWORD']
database_name = settings.DATABASES['default']['NAME']

database_url = 'postgresql://{user}:{password}@localhost:5432/{database_name}'.format(
    user=user,
    password=password,
    database_name=database_name,
)

engine = create_engine(database_url, echo=False)
df.to_sql(HistoricalPrices, con=engine)
Stefan
  • 41,759
  • 13
  • 76
  • 81
  • Is there any way to get the auto ID generated by the database back into the `DataFrame`? (My Excel data needs to be stored in several Django models linked by foreign key.) – Chris Sep 16 '16 at 10:40
  • 1
    You can always use [read_sql](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html) to get the complete df back with all auto-generated columns. django on its own will just use the `id` generated by the database, if that's what you're referring to. – Stefan Sep 16 '16 at 10:45
  • Thanks @Stefan, in my case identifying my latest subset of rows would be an expensive operation and possibly error prone. That said the alternative [seems](http://stackoverflow.com/questions/26770489/how-to-get-autoincrement-values-for-a-column-after-uploading-a-pandas-dataframe) to be locking the table and manually setting the ID, or writing [via CSV](http://stackoverflow.com/questions/31997859/bulk-insert-a-pandas-dataframe-using-sqlalchemy) so also likely to hit performance or lose IDs, and perhaps this is just a fundamental limitation of bulk inserts. I'll do more research and report back. – Chris Sep 16 '16 at 11:03
  • Great response. Just wondering what file you would place this code in to from the Django framework? – Brian Waters Oct 27 '17 at 14:14
  • 1
    Two questions -- (1) Is there a way to load the database_url from settings as well instead of hardcoding it? (2) Is this the most vectorized way to do this? – Vishal Apr 28 '19 at 14:12
  • This looks like you may want to ask a new question. – Stefan Apr 29 '19 at 16:18
  • For anyone who is having trouble with using the pandas to_sql function with django Model. You might need to specify the database table name rather than the model name in to_sql as follows. df.to_sql(HistoricalPrices._meta.db_table, con=engine) – hasan najeeb Feb 15 '21 at 18:21
  • All good, but where is the `create_engine` and does that conflict with the running engine (I suspect not)? – petroslamb Sep 24 '21 at 07:30
  • @petroslamb see pandas [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) linked in the post. – Stefan Sep 24 '21 at 17:07
  • Thats an SQLAlachemy engine. There is no mention to using that dependency and it's kind of an overkill to have both ORMs in there. But I guess it's the only ORM Pandas supports? – petroslamb Sep 27 '21 at 01:51
  • This is a Django question, but the accepted answer includes a reference to `create_engine()`, which is not part of Django. Appears to come from SQLAlchemy, but accepted answer does not show where it's imported from. Hence the downvote :( – shacker Aug 25 '22 at 22:32
  • @shacker added the import line, hope this helps. – Stefan Sep 04 '22 at 18:55
  • Hi, this code works but when I'm querying the data I get the **no such column: column_name.id** it's like my database doesn't generate ids. What i use *df.to_sql(Package._meta.db_table, if_exists='replace', con=engine, index=False, index_label=None, method=None)* – rafaelHTML Sep 16 '22 at 15:13
0

Easier way , you can try this :

json_list = json.loads(json.dumps(list(df.T.to_dict().values())))

for dic in json_list:
     HistoricalPrices.objects.get_or_create(**dic)
Amir.S
  • 719
  • 8
  • 15
  • 5
    inserting one row at a time is never a good choice. Having fewer lines of code doesn't matter. – anishtain4 Aug 12 '21 at 00:38
  • @anishtain4 you can also use bulk_create , but if you want to check if a row exists or not in DB, then you can not use bulk_create , you should insert row by row – Amir.S Jun 02 '23 at 14:05