1
from pandas import DataFrame
import time

data = []
for i in range(3000):
    data.append(['SH601318', 'abcdef', 0.0001215, 0.000215, 0.125, 0.243])
df = DataFrame(data)
df.columns = ['symbol', 'name', 'total_ratio', 'outstanding_ratio', 'avg_total_ratio', 'avg_outstanding_ratio']

t = time.time()
result = [{
    'symbol': df.at[i, 'symbol'],
    'name': df.at[i, 'name'],
    'total_ratio': df.at[i, 'total_ratio'],
    'outstanding_ratio': df.at[i, 'outstanding_ratio'],
    'avg_total_ratio': df.at[i, 'avg_total_ratio'],
    'avg_outstanding_ratio': df.at[i, 'avg_outstanding_ratio'],
} for i in range(len(df))]
print '%.2f seconds' % (time.time() - t)
# 0.25 seconds

t = time.time()
result = [df.iloc[i].to_dict() for i in range(len(df))]
print '%.2f seconds' % (time.time() - t)
# 0.58 seconds

I tried 2 ways to convert DataFrame to list of dict. But both are very slow, 250 ms and 580 ms! That's far more than time I query from database. I don't know why it takes so much time, after all, manipulating memory is quicker than disk. I expected this time is in 10 ms. Is there any way I can achieve it?

gzc
  • 8,180
  • 8
  • 42
  • 62
  • 3
    Why not `df.to_dict(orient='records')`? – Zero Oct 12 '16 at 10:24
  • I can only surmise that the extra time comes from the loop: the overhead of each iteration is the fact that you are creating a Series as well as the overhead of the `to_dict` method each iteration. – juanpa.arrivillaga Oct 12 '16 at 10:29
  • 2
    @JohnGalt I ignore your answer in [another question](http://stackoverflow.com/questions/29815129/pandas-dataframe-to-list-of-dictionaries-dics) by mistake. – gzc Oct 12 '16 at 10:30

1 Answers1

3

I think you need to_dict with parameter orient='records':

print (df.to_dict(orient='records'))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252