1

The following code works, however I am interested to learn if there is a more efficient way of writing to a dataframe, as opposed to 1 row at a time.

I have a json reponse workoutSamples which contains some nested json at the "data" node which I am adding to a dataframe. So for example, is there an easier way just pull all of the d['cadence'] below into the column 'cadence' as 1 insert for the entire column in a data frame rather than looping through each node and inserting 1 row at a time?

workouts_stats_intra = pd.DataFrame(
    columns=['workoutId', 'seconds', 'cadence', 'distance', 'heart_rate', 'power', 'speed'])

for d in workoutSamples:
    workoutId = d["workoutId"]
    logger.info("Workout Details: " + str(workoutId))
    for row in d["data"]:
        workouts_stats_intra.loc[len(workouts_stats_intra)] = [workoutId, row["seconds_since_pedaling_start"],
                                                                   row["cadence"], row["distance"], row["heart_rate"],
                                                                   row["power"], row["speed"]]
jpp
  • 159,742
  • 34
  • 281
  • 339
Ethanopp
  • 73
  • 1
  • 7
  • Can you add data sample of json file? – jezrael Mar 30 '18 at 06:01
  • Possible duplicate of [How to append rows in a pandas dataframe in a for loop?](https://stackoverflow.com/questions/31674557/how-to-append-rows-in-a-pandas-dataframe-in-a-for-loop) – jpp Mar 30 '18 at 11:08

2 Answers2

0

I think better is create list by append tuples and then pass to DataFrame constructor:

L = []
for d in workoutSamples:
    workoutId = d["workoutId"]
    logger.info("Workout Details: " + str(workoutId))
    for row in d["data"]:
        L.append((workoutId, row["seconds_since_pedaling_start"],
                             row["cadence"], row["distance"], row["heart_rate"],
                             row["power"], row["speed"]))

workouts_stats_intra = pd.DataFrame(L, 
    columns=['workoutId', 'seconds', 'cadence', 'distance', 'heart_rate', 'power', 'speed'])

Another possible solution is use json_normalize.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • How do I append with json_normalize? workout_details = pd.DataFrame( columns=['workoutId', 'seconds', 'cadence', 'distance', 'heart_rate', 'power', 'speed']) for d in workoutSamples: workoutId = d["workoutId"] logger.info("Workout Details: " + str(workoutId)) workout_details = json_normalize(d,"data",['workoutId']) – Ethanopp Mar 30 '18 at 17:02
-1

For huge number of columns, the code below will work faster -

test = [{'Id' :1, 'Data':[{'Col1': 1, 'Col2':1, 'Col3':1}, {'Col1': 1, 'Col2':1, 'Col3':1}]},
       {'Id' :2, 'Data':[{'Col1': 2, 'Col2':2, 'Col3':2}]},
       {'Id' :3, 'Data':[{'Col1': 3, 'Col2':3, 'Col3':3}]}]

pd.concat([pd.concat([pd.Series([test[t1]['Id']]*len(test[t1]['Data']), name='Id'), pd.DataFrame.from_dict(test[t1]['Data'])], axis=1) for t1 in range(len(test))], axis=0)
Aritesh
  • 1,985
  • 1
  • 13
  • 17