0

I have a ORM set up in python with flyweights and underlying data stored in HDF5 files/pandas dataframes in memory. The exact mechanics aren't super important - but I'm finding that data retrieval by indexing in pandas DataFrames is quite slow. For example see the following:

enter image description here

For explanation FutDaily('ESA Index')._data returns a multiindex Pandas dataframe like the following:

enter image description here


which is extremely fast because of the flyweight design pattern. However, I'm surprised that the indexing takes so long. Is there a faster way to pull out relevant data/slices from a Multiindex? Is it not doing some sort of dictionary lookup under the hood - in which case it should be super fast?

Michael
  • 7,087
  • 21
  • 52
  • 81
  • 2
    ix is deprecated in favor of loc & iloc, but I'd guess the speed is similar. have you tried at or iat? see here: https://stackoverflow.com/questions/28757389/loc-vs-iloc-vs-ix-vs-at-vs-iat or here: https://pandas.pydata.org/pandas-docs/stable/indexing.html#fast-scalar-value-getting-and-setting – JohnE Jul 07 '17 at 14:30
  • yes but it seems like iat and at aren't compatible with multiindex – Michael Jul 07 '17 at 14:33
  • Ah, good point. I don't have any ideas, I'm afraid, other than checking how loc/iloc compare to ix. Pandas indexing is generally considered to be very good compared to most alternatives. I think you'd have to provide more information about the data for someone else here to give you a suggestion for speeding things up. – JohnE Jul 07 '17 at 14:44
  • column access is faster than row access in pandas - you can think of a dataframe as 'a list of lists' - each _column_ is a numpy array, that's why each column has a dtype, but each row can have different types. so pulling out a column just means grabbing the array for that column and returning it, but grabbing rows means it has to go to each column, extract out the right entry, build new rows, and give those back to you. however, i think that the cost of grabbing N columns will be the same as the cost of grabbing 1 column, if that helps. – Corley Brigman Jul 07 '17 at 14:49

0 Answers0