More quickly ingest data from mongo into python

Asked Nov 03 '15 at 11:52

Active Nov 03 '15 at 12:43

Viewed 83 times

I have the following data read:

print datetime.datetime.now()

opts = m.get_unique_dates_for_underlying(ticker, "date")

print datetime.datetime.now()

With the following output:

2015-11-02 22:46:50.371000
2015-11-02 22:46:50.371000

Now when I go to read the data into my dataframe it takes 12 minutes when running:

df_opts = pd.DataFrame(list(opts)).convert_objects()

Is there a faster way to accomplish this?

df_opts is of length 351,500.

edited Nov 03 '15 at 12:43

Leb

asked Nov 03 '15 at 11:52

jason m

Does `m.get_unique_dates_for_underlying` return a generator ? If so, that would explain why that returns 'instantly' whereas when you're packing it into your DataFrame you are casting it into a list which Python then needs to iterate over all the objects, 351,500 of them, and create a list. If it is a generator, can you not let pandas load from a generator so you get to skip the explicit creation of a massive list ? – Christian Witts Nov 03 '15 at 11:55
@ChristianWitts I followed http://stackoverflow.com/questions/16249736/how-to-import-data-from-mongodb-to-pandas regarding my use of list(cursor) – jason m Nov 03 '15 at 11:57
From reading around it seems an issue. Have you tried [monary](https://bitbucket.org/djcbeach/monary/wiki/Home)? – Leb Nov 03 '15 at 12:48
@Leb thank yes i discovered monary during my commute after posting this question. Thank you for the additional momentum to go that route. – jason m Nov 03 '15 at 13:35

0 Answers0