1

I have the following data read:

print datetime.datetime.now()

opts = m.get_unique_dates_for_underlying(ticker, "date")

print datetime.datetime.now()

With the following output:

2015-11-02 22:46:50.371000
2015-11-02 22:46:50.371000

Now when I go to read the data into my dataframe it takes 12 minutes when running:

df_opts = pd.DataFrame(list(opts)).convert_objects()

Is there a faster way to accomplish this?

df_opts is of length 351,500.

Leb
  • 15,483
  • 10
  • 56
  • 75
jason m
  • 6,519
  • 20
  • 69
  • 122
  • Does `m.get_unique_dates_for_underlying` return a generator ? If so, that would explain why that returns 'instantly' whereas when you're packing it into your DataFrame you are casting it into a list which Python then needs to iterate over all the objects, 351,500 of them, and create a list. If it is a generator, can you not let pandas load from a generator so you get to skip the explicit creation of a massive list ? – Christian Witts Nov 03 '15 at 11:55
  • @ChristianWitts I followed http://stackoverflow.com/questions/16249736/how-to-import-data-from-mongodb-to-pandas regarding my use of list(cursor) – jason m Nov 03 '15 at 11:57
  • From reading around it seems an issue. Have you tried [monary](https://bitbucket.org/djcbeach/monary/wiki/Home)? – Leb Nov 03 '15 at 12:48
  • @Leb thank yes i discovered monary during my commute after posting this question. Thank you for the additional momentum to go that route. – jason m Nov 03 '15 at 13:35

0 Answers0