0

I am trying to get the month with the most passenger trips. Currently my code gets the correct maximum Passenger trips for each route but gives busiest month of 12 for all. What have I missed? CSV of data used

import pandas as pd
    df = pd.read_csv("data.csv")
    df.rename(columns={'MonthNum': 'BusiestMonth', 'PassengerTrips': 'Numtrips'}, inplace=True)
    df = df[['City1', 'City2', 'BusiestMonth', 'Numtrips']]
    df = df.groupby(['City1', 'City2']).max('Numtrips') #doesnt get busiest month, just gets december
    print('Each routes busiest month:')
    print(df)

Output

SQL Learner 1
  • 147
  • 2
  • 7
  • 1
    That is not how [groupby max](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.max.html) works. The first parameter is a boolean `numeric_only=False`. Because non-empty strings are truthy in python you've basically called `groupby([...]).max(numeric_only=True)` which I don't think was your intention. But it is the _reason_ you're only getting 12 as that would be the maximal month value always. – Henry Ecker Jul 29 '21 at 04:25
  • 1
    I'd recommend sort and drop but any of the options in the duplicate work. `df = df.sort_values(['City1', 'City2', 'Numtrips']).drop_duplicates(['City1', 'City2'], keep='last')` – Henry Ecker Jul 29 '21 at 04:28

0 Answers0