2

Guys I have a Dataframe

df= pd.DataFrame({'Point_ID':[1,2,3,1,2,1] , 'Shape_ID': [84,85,86,87,88,89],'LOL':[0,1,0,1,np.nan,np.nan]})

Out[1116]: 
   LOL  Point_ID  Shape_ID
0  0.0         1        84
1  1.0         2        85
2  0.0         3        86
3  1.0         1        87
4  NaN         2        88
5  NaN         1        89

When I did :

df.groupby('Point_ID').last()
Out[1114]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

On Shape_ID it returned the last value , but on LOL should it return NaN ?

By using max, I get the same answer as I using last() when the Dataframe is sorted

df.groupby('Point_ID').max()

Out[1115]: 
          LOL  Shape_ID
Point_ID               
1         1.0        89
2         1.0        88
3         0.0        86

I am reading the pandas file about the both function first and last, can not find the answer. Is there anyone can help ? Much appreciate~~:-)

BENY
  • 317,841
  • 20
  • 164
  • 234
  • Not sure I understand your question but `first()` and `last()` return the first and last element in the group. It's pretty straightforward. If `1` is your key, the last `LOL` is `-1`. – o-90 Aug 17 '17 at 20:54
  • Its not returning the min value on LOL, just the corresponding value for the last of Point_ID – Vaishali Aug 17 '17 at 20:55

2 Answers2

2

Demo:

let's shuffle your DF:

In [339]: df = df.sample(frac=1)

In [340]: df
Out[340]:
   LOL  Point_ID  Shape_ID
4    0         2        88
0    0         1        84
1    0         2        85
3    1         1        87
2    0         3        86
5   -1         1        89

In [341]: df.groupby('Point_ID').min()
Out[341]:
          LOL  Shape_ID
Point_ID
1          -1        84
2           0        85  #  <----
3           0        86

In [342]: df.groupby('Point_ID').first()
Out[342]:
          LOL  Shape_ID
Point_ID
1           0        84
2           0        88  #  <----
3           0        86
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 1
    @Wen, you might want to check this [question & answer](https://stackoverflow.com/questions/38797271/get-first-and-last-values-in-a-groupby) and [this](https://stackoverflow.com/questions/45744800/why-doesnt-first-and-last-in-a-groupby-give-me-first-and-last) – MaxU - stand with Ukraine Aug 17 '17 at 21:01
2

Its just returning all the values corresponding to the last value of point_Id.

Consider this df in which I added a row to your sample

    LOL Point_ID    Shape_ID
0   0   1           84
1   0   2           85
2   0   3           86
3   1   1           87
4   0   2           88
5   -1  1           89
6   1   2           25

If you groupby

df.groupby('Point_ID').last()

You get

        LOL Shape_ID
Point_ID        
1       2   25
2       0   88
3       0   86

Here the value in LOL happens to be the max but its not max, just the value of LOL corresponding to the last row with point_id 1

Do go through this github issue on the same, it says for the moment skipping NaN is a feature of first/last. If you don't want that behaviour, use nth with dropna = False

df.groupby('Point_ID').nth(-1,dropna=False)

        LOL Shape_ID
Point_ID        
1       NaN 89
2       NaN 88
3       0.0 86
Vaishali
  • 37,545
  • 5
  • 58
  • 86