1

I was wondering if anyone could help me with parallel coordinate plotting.

First this is how the data looks like:

Data It's data manipulated from : https://data.cityofnewyork.us/Transportation/2016-Yellow-Taxi-Trip-Data/k67s-dv2t

So I'm trying to normalise some features and use that to compute the mean of trip distance, passenger count and payment amount for each day of the week.

from pandas.tools.plotting import parallel_coordinates

feature = ['trip_distance','passenger_count','payment_amount']

#normalizing data
for feature in features:
     df[feature] = (df[feature]-df[feature].min())/(df[feature].max()-df[feature].min())

#change format to datetime
pickup_time = pd.to_datetime(df['pickup_datetime'], format ='%d/%m/%y %H:%M')
#fill dayofweek column with 0~6 0:Monday and 6:Sunday
df['dayofweek'] = pickup_time.dt.weekday

mean_trip = df.groupby('dayofweek').trip_distance.mean()
mean_passanger = df.groupby('dayofweek').passenger_count.mean()
mean_payment = df.groupby('dayofweek').payment_amount.mean()

#parallel_coordinates('notsurewattoput')

So if I print mean_trip:

enter image description here

It shows the mean of each day of the week but I'm not sure how I would use this to draw a parallel coordinate plot with all 3 means on the same plot.

Does anyone know how to implement this?

vestland
  • 55,229
  • 37
  • 187
  • 305
Min
  • 528
  • 1
  • 7
  • 26

1 Answers1

1

I think you can change 3 times aggregating mean to one with output DataFrame instead 3 Series:

mean_trip = df.groupby('dayofweek').trip_distance.mean()
mean_passanger = df.groupby('dayofweek').passenger_count.mean()
mean_payment = df.groupby('dayofweek').payment_amount.mean()

to:

from pandas.tools.plotting import parallel_coordinates

cols = ['trip_distance','passenger_count','payment_amount']
df1 = df.groupby('dayofweek', as_index=False)[cols].mean()
#https://stackoverflow.com/a/45082022
parallel_coordinates(df1, class_column='dayofweek', cols=cols)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you so much again! worked like magic ! DIdn't know you could combine that like that. Thank you ! – Min Mar 28 '18 at 11:42
  • just one question what does as_index = False do ? – Min Mar 28 '18 at 11:43
  • It create column from `dayofweek`, it same working like `df.groupby('dayofweek')[cols].mean().reset_index()` – jezrael Mar 28 '18 at 11:44