Optimal way to write data to pandas dataframe

Question

The following code works, however I am interested to learn if there is a more efficient way of writing to a dataframe, as opposed to 1 row at a time.

I have a json reponse workoutSamples which contains some nested json at the "data" node which I am adding to a dataframe. So for example, is there an easier way just pull all of the d['cadence'] below into the column 'cadence' as 1 insert for the entire column in a data frame rather than looping through each node and inserting 1 row at a time?

workouts_stats_intra = pd.DataFrame(
    columns=['workoutId', 'seconds', 'cadence', 'distance', 'heart_rate', 'power', 'speed'])

for d in workoutSamples:
    workoutId = d["workoutId"]
    logger.info("Workout Details: " + str(workoutId))
    for row in d["data"]:
        workouts_stats_intra.loc[len(workouts_stats_intra)] = [workoutId, row["seconds_since_pedaling_start"],
                                                                   row["cadence"], row["distance"], row["heart_rate"],
                                                                   row["power"], row["speed"]]

Possible duplicate of [How to append rows in a pandas dataframe in a for loop?](https://stackoverflow.com/questions/31674557/how-to-append-rows-in-a-pandas-dataframe-in-a-for-loop) — jpp, Mar 30 '18 at 11:08

score 0 · Accepted Answer · answered Mar 30 '18 at 05:59

I think better is create list by append tuples and then pass to DataFrame constructor:

L = []
for d in workoutSamples:
    workoutId = d["workoutId"]
    logger.info("Workout Details: " + str(workoutId))
    for row in d["data"]:
        L.append((workoutId, row["seconds_since_pedaling_start"],
                             row["cadence"], row["distance"], row["heart_rate"],
                             row["power"], row["speed"]))

workouts_stats_intra = pd.DataFrame(L, 
    columns=['workoutId', 'seconds', 'cadence', 'distance', 'heart_rate', 'power', 'speed'])

Another possible solution is use json_normalize.

How do I append with json_normalize? workout_details = pd.DataFrame( columns=['workoutId', 'seconds', 'cadence', 'distance', 'heart_rate', 'power', 'speed']) for d in workoutSamples: workoutId = d["workoutId"] logger.info("Workout Details: " + str(workoutId)) workout_details = json_normalize(d,"data",['workoutId']) — Ethanopp, Mar 30 '18 at 17:02

score -1 · Answer 2 · answered Mar 30 '18 at 06:35

For huge number of columns, the code below will work faster -

test = [{'Id' :1, 'Data':[{'Col1': 1, 'Col2':1, 'Col3':1}, {'Col1': 1, 'Col2':1, 'Col3':1}]},
       {'Id' :2, 'Data':[{'Col1': 2, 'Col2':2, 'Col3':2}]},
       {'Id' :3, 'Data':[{'Col1': 3, 'Col2':3, 'Col3':3}]}]

pd.concat([pd.concat([pd.Series([test[t1]['Id']]*len(test[t1]['Data']), name='Id'), pd.DataFrame.from_dict(test[t1]['Data'])], axis=1) for t1 in range(len(test))], axis=0)

Optimal way to write data to pandas dataframe

2 Answers2