0

I have a "travel_by_mode" data-frame that I would like to clean up.

Basically, I'd like to show the total travel distance by vehicle mode for each individual household ID, indicated in the 'hhid' column. My data-frame is as follows:

travel_by_mode = pd.DataFrame({'hhid': [1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], 'mode': ['Dala-Dala', 'Walk', 'BRT', 'Dala-Dala', 'Private Vehicle', 'Dala-Dala', 'Walk', 'Private Vehicle', 'Dala-Dala', 'Walk', 'BRT'],  'length_in_km': [4,5,6,7,8,7,6,7,8,9,10]})

print(travel_by_mode)

I would like to create a new data frame that would show the total travel distance for each mode by hhid.

So it would look something like this for the first household:

travel_by_mode2 = pd.DataFrame({'hhid': 1, 'distance_dala': 4, 'distance_walk': 4, 'distance_brt': 6,'distance_private_vehicle': 0})

print(travel_by_mode2)

In the end I would like to have a new dataframe that shows the total travel distance for each vehicle mode.

Help?

Thank you!

Lulu
  • 5
  • 5

1 Answers1

0

I think you're looking for a pivot_table:

In [11]: travel_by_mode.pivot_table(index='hhid', columns='mode', values='length_in_km', fill_value=0)
Out[11]:
mode  BRT  Dala-Dala  Private Vehicle  Walk
hhid
1       6          4                0     5
2       0          7                8     0
3       0          7                0     6
4       0          8                7     0
5      10          0                0     9
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535