2

I've read this thread about grouping and getting max: Apply vs transform on a group object.

It works perfectly and is helpful if your max is unique to a group but I'm running into an issue of ignoring duplicates from a group, getting the max of unique items then putting it back into the DataSeries.

Input (named df1):

date       val
2004-01-01 0
2004-02-01 0
2004-03-01 0
2004-04-01 0
2004-05-01 0 
2004-06-01 0
2004-07-01 0
2004-08-01 0
2004-09-01 0
2004-10-01 0
2004-11-01 0
2004-12-01 0
2005-01-01 11
2005-02-01 11
2005-03-01 8
2005-04-01 5
2005-05-01 0 
2005-06-01 0
2005-07-01 2
2005-08-01 1
2005-09-01 0
2005-10-01 0
2005-11-01 3
2005-12-01 3

My code:

df1['peak_month'] = df1.groupby(df1.date.dt.year)['val'].transform(max) == df1['val']

My Output:

date       val   max
2004-01-01 0     true #notice how all duplicates are true in 2004
2004-02-01 0     true
2004-03-01 0     true
2004-04-01 0     true
2004-05-01 0     true
2004-06-01 0     true
2004-07-01 0     true
2004-08-01 0     true
2004-09-01 0     true
2004-10-01 0     true
2004-11-01 0     true
2004-12-01 0     true
2005-01-01 11    true #notice how these two values 
2005-02-01 11    true #are the max values for 2005 and are true
2005-03-01 8     false
2005-04-01 5     false
2005-05-01 0     false 
2005-06-01 0     false
2005-07-01 2     false
2005-08-01 1     false
2005-09-01 0     false
2005-10-01 0     false
2005-11-01 3     false
2005-12-01 3     false

Expected Output:

 date       val   max
2004-01-01 0     false #notice how all duplicates are false in 2004
2004-02-01 0     false #because they are the same and all vals are max
2004-03-01 0     false
2004-04-01 0     false
2004-05-01 0     false 
2004-06-01 0     false
2004-07-01 0     false
2004-08-01 0     false
2004-09-01 0     false
2004-10-01 0     false
2004-11-01 0     false
2004-12-01 0     false
2005-01-01 11    false #notice how these two values 
2005-02-01 11    false #are the max values for 2005 but are false
2005-03-01 8     true  #this is the second max val and is true
2005-04-01 5     false
2005-05-01 0     false 
2005-06-01 0     false
2005-07-01 2     false
2005-08-01 1     false
2005-09-01 0     false
2005-10-01 0     false
2005-11-01 3     false
2005-12-01 3     false

For reference:

df1 = pd.DataFrame({'val':[0, 0, 0, 0, 0 , 0, 0, 0, 0, 0, 0, 0, 11, 11, 8, 5, 0 , 0, 2, 1, 0, 0, 3, 3],
'date':['2004-01-01','2004-02-01','2004-03-01','2004-04-01','2004-05-01','2004-06-01','2004-07-01','2004-08-01','2004-09-01','2004-10-01','2004-11-01','2004-12-01','2005-01-01','2005-02-01','2005-03-01','2005-04-01','2005-05-01','2005-06-01','2005-07-01','2005-08-01','2005-09-01','2005-10-01','2005-11-01','2005-12-01',]})
Community
  • 1
  • 1
ethanenglish
  • 1,217
  • 2
  • 15
  • 32
  • This question is not clear and you have too much data to illustrate your point. I don't know why you want to ignore duplicates. The max of [5, 5, 2, 2] is the same as the max of [5, 2]. – Alexander Mar 08 '16 at 18:02
  • I need one value for max year or none if they're the same. – ethanenglish Mar 08 '16 at 18:44

1 Answers1

2

Not the slickest solution, but it works. The idea is to first determine the unique values appearing in each year, and then do your transform just on those unique values.

# Determine the unique values appearing in each year.
df1['year'] = df1.date.dt.year
unique_vals = df1.drop_duplicates(subset=['year', 'val'], keep=False)

# Max transform on the unique values.
df1['peak_month'] = unique_vals.groupby('year')['val'].transform(max) == unique_vals['val']

# Fill NaN's as False, drop extra column.
df1['peak_month'].fillna(False, inplace=True)
df1.drop('year', axis=1, inplace=True)
root
  • 32,715
  • 6
  • 74
  • 87
  • No, the `keep=False` keyword argument forces `drop_duplicates` to drop all copies of the duplicate data. Without this keyword argument your concern would be valid, as `drop_duplicates` keeps the first duplicate record by default. My code produces the expected output. – root Mar 08 '16 at 18:11
  • @Parfait That works like a charm. Thanks for taking a look and stepping through the logic! – ethanenglish Mar 08 '16 at 18:55