1

I am trying to show the first row by group (in this case, Car is the group). When I try to do this with the data below, however, my code shows 45 for the time of Fred (which is actually Betsy's time from the row below). I would like the output to show the first full row for Car A & the first full row for Car B even if they have np.nan in the time column.

Can someone help me understand what I'm doing wrong and why my code would be combining row information like this?

Thanks!

import pandas as pd

test_df = pd.DataFrame({'Race':[1,1,1,2,2,2],'Car':['A','A','A','B','B','B'], 'Date':['5/1/2019','4/15/2019','3/1/2019','5/1/2019','2/1/2019','1/5/2019'],
                        'Driver':['Fred','Betsy','John','John','Frank','Frank'],'Time':[np.nan,45,46,47,44,43]})

test_df = test_df.sort_values(['Race', 'Car', 'Date'], ascending=[True, True, False]).groupby(['Car'], as_index=False).first()

newcoder
  • 65
  • 7
  • This post has some more details: https://stackoverflow.com/questions/55583246/what-is-different-between-groupby-first-groupby-nth-groupby-head-when-as-index/55583395#55583395 – ALollz May 02 '19 at 03:28
  • Thanks much, ALollz. This is a helpful post. – newcoder May 02 '19 at 18:39

2 Answers2

0

Use .head(1) instead of .first():

Output:

   Race Car      Date Driver  Time
0     1   A  5/1/2019   Fred   NaN
3     2   B  5/1/2019   John  47.0

The difference between the two is how NaN is being treated: link.

Ji Wei
  • 840
  • 9
  • 19
0

Use nth(0,dropna=False) instead of first()

test_df = test_df.sort_values(['Race', 'Car', 'Date'], ascending=[True, True, False]).groupby(['Car'], as_index=False,).nth(0,dropna=False)

Output

    Race Car   Date    Driver   Time
0    1   A   5/1/2019  Fred     NaN
3    2   B   5/1/2019  John    47.0
vb_rises
  • 1,847
  • 1
  • 9
  • 14
  • Great solution, Vishal. Thanks for taking time to answer my question. I tried to upvote but can't given my limited history on the site. Regardless, I appreciate your help! Thanks. – newcoder May 02 '19 at 03:06