My goal is to sort the data frame by 1 column and return a json object as efficiently as possible.
For repoduction, please define the following dataframe:
import pandas as pd
import numpy as np
test = pd.DataFrame(data={'a':[np.random.randint(0,100) for i in range(10000)], 'b':[i + np.random.randint(0,100) for i in range(10000)]})
a b
0 74 89
1 55 52
2 53 39
3 26 21
4 69 34
What I need to do is sort by column a
and then encode the output in a json object. I'm taking the basic approach and doing:
test.sort_values('a', ascending=True, inplace=True) # n log n
data = [{}] # 1
for d in test.itertuples(): # n times
to_append = {'id': d.Index, 'data': {'a': d.a, 'b': d.b}} # 3
data.append(to_append) # 1
So is the cost nlogn + n*4? Are there any more efficient ways of doing it?