How is the different between pd.groupby().first() with pd.groupby().min()?

Question

Guys I have a Dataframe

df= pd.DataFrame({'Point_ID':[1,2,3,1,2,1] , 'Shape_ID': [84,85,86,87,88,89],'LOL':[0,1,0,1,np.nan,np.nan]})

Out[1116]: 
   LOL  Point_ID  Shape_ID
0  0.0         1        84
1  1.0         2        85
2  0.0         3        86
3  1.0         1        87
4  NaN         2        88
5  NaN         1        89

When I did :

df.groupby('Point_ID').last()
Out[1114]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

On Shape_ID it returned the last value , but on LOL should it return NaN ?

By using max, I get the same answer as I using last() when the Dataframe is sorted

df.groupby('Point_ID').max()

Out[1115]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

I am reading the pandas file about the both function first and last, can not find the answer. Is there anyone can help ? Much appreciate~~:-)

Not sure I understand your question but `first()` and `last()` return the first and last element in the group. It's pretty straightforward. If `1` is your key, the last `LOL` is `-1`. — o-90, Aug 17 '17 at 20:54
Its not returning the min value on LOL, just the corresponding value for the last of Point_ID — Vaishali, Aug 17 '17 at 20:55

score 2 · Accepted Answer · answered Aug 17 '17 at 20:56

Demo:

let's shuffle your DF:

In [339]: df = df.sample(frac=1)

In [340]: df
Out[340]:
   LOL  Point_ID  Shape_ID
4    0         2        88
0    0         1        84
1    0         2        85
3    1         1        87
2    0         3        86
5   -1         1        89

In [341]: df.groupby('Point_ID').min()
Out[341]:
          LOL  Shape_ID
Point_ID
1          -1        84
2           0        85  #  <----
3           0        86

In [342]: df.groupby('Point_ID').first()
Out[342]:
          LOL  Shape_ID
Point_ID
1           0        84
2           0        88  #  <----
3           0        86

@Wen, you might want to check this [question & answer](https://stackoverflow.com/questions/38797271/get-first-and-last-values-in-a-groupby) and [this](https://stackoverflow.com/questions/45744800/why-doesnt-first-and-last-in-a-groupby-give-me-first-and-last) — MaxU - stand with Ukraine, Aug 17 '17 at 21:01

Vaishali · Answer 2 · 2017-08-17T21:11:46.060

Its just returning all the values corresponding to the last value of point_Id.

Consider this df in which I added a row to your sample

    LOL Point_ID    Shape_ID
0   0   1           84
1   0   2           85
2   0   3           86
3   1   1           87
4   0   2           88
5   -1  1           89
6   1   2           25

If you groupby

df.groupby('Point_ID').last()

You get

        LOL Shape_ID
Point_ID        
1       2   25
2       0   88
3       0   86

Here the value in LOL happens to be the max but its not max, just the value of LOL corresponding to the last row with point_id 1

Do go through this github issue on the same, it says for the moment skipping NaN is a feature of first/last. If you don't want that behaviour, use nth with dropna = False

df.groupby('Point_ID').nth(-1,dropna=False)

        LOL Shape_ID
Point_ID        
1       NaN 89
2       NaN 88
3       0.0 86

How is the different between pd.groupby().first() with pd.groupby().min()?

2 Answers2