3

So, df['date'] returns:

0        2018-03-01
1        2018-03-01
2        2018-03-01
3        2018-03-01
4        2018-03-01
            ...    
469796   2018-06-20
469797   2018-06-20
469798   2018-06-27
469799   2018-06-27
469800   2018-12-06
Name: date, Length: 469801, dtype: datetime64[ns]

And, df['date'].sort_values() returns:

137241   2018-01-01
378320   2018-01-01
247339   2018-01-01
34333    2018-01-01
387971   2018-01-01
            ...    
109278   2018-12-06
384324   2018-12-06
384325   2018-12-06
109282   2018-12-06
469800   2018-12-06
Name: date, Length: 469801, dtype: datetime64[ns]

Now df['date'].sort_values()[0] "ignores sorting" and returns:

Timestamp('2018-03-01 00:00:00')

Whereas df['date'].sort_values()[0:1] actually returns:

137241   2018-01-01
Name: date, dtype: datetime64[ns]

Why the apparently inconsistent behaviour? As @cs95 accurately pointed out they return a scalar and a Series respectively, which is okay. I am confused about the value, the first one is 2018-03-01 while the second one is 2018-01-01.

Thanks in advance.


Warning

Somehow similar to: why sort_values() is diifferent form sort_values().values

cs95
  • 379,657
  • 97
  • 704
  • 746
gmagno
  • 1,770
  • 1
  • 19
  • 40
  • Before I set up an example, `df['date'].sort_values().iloc[0:1, :]`. Does that do as expected? – roganjosh Oct 26 '19 at 20:17
  • Er... nope, error: `Too many indexers` – gmagno Oct 26 '19 at 20:18
  • `df['date'].sort_values()[0]` returns a scalar, `df['date'].sort_values()[0:1]` returns a Series. You have to understand how python's list slicing notation works first. – cs95 Oct 26 '19 at 20:19
  • @cs95 thanks for taking the time to reply, but I still don't see how your comment helps. Perhaps you could elaborate a bit? – gmagno Oct 26 '19 at 20:26
  • @cs95 I think you are missing my question point. The returned scalar and Series are different in type (obviously) but in content/value as well. – gmagno Oct 26 '19 at 20:27
  • [0] represents a scalar, [0:1] represents a list with a single element. In the pandas world, you are returned a Series. Let me know if you have any other questions! – cs95 Oct 26 '19 at 20:39
  • [0, 1][0] returns a scalar, 0, and [0, 1][0:1] returns a list, [0]. Still both are/have 0. – gmagno Oct 26 '19 at 20:42
  • 1
    This is one of the weird things about pandas. `df['date'].sort_values()[0]` actually sorts the series but then you don't ask for the item at the first position (0) but instead you ask for the one "labeled" 0 -- the first item in the original series. You can replace it with `df['date'].sort_values().iloc[0]`. – ayhan Oct 26 '19 at 20:47
  • @ayhan, So `df['date'].sort_values()[0]` is really getting the row with index 0. I see now, thanks for clarifying mate. – gmagno Oct 26 '19 at 20:49
  • Sorry, I was initially confused by what you were asking. Reopened the question and hopefully my answer supplements ayhan's helpful comments sufficiently. Let me know if we can help with any further clarification. – cs95 Oct 26 '19 at 20:51

1 Answers1

1

There is a slight difference in how indexing is interpreted for Pandas for scalar indexing vs slicing. Consider a simpler example:

df = pd.DataFrame({'col1': [5, 4, 3, 2, 1]}).sample(frac=1)
df
   col1
4     1
1     4
0     5
3     2
2     3

Also note the result of sort_values:

df['col1'].sort_values()
4    1
3    2
2    3
1    4
0    5

When you call df['col1'].sort_values()[0] you actually get the value indexed by key 0. Here it is implicitly calling loc:

df['col1'].sort_values()[0]     # just gets the value indexed by that key
# 5

df['col1'].sort_values().loc[0]
# 5

And when you slice with indexes, it is assumed they are integral rather than labels, meaning it is implicitly calling iloc:

df['col1'].sort_values()[0:1]   # just gets the first row  
4    1
Name: col1, dtype: int64


df['col1'].sort_values().iloc[0:1]
4    1
Name: col1, dtype: int64

If you want the scalar index operation to return the same thing as the slice, use iloc or iat (singular value):

df['col1'].sort_values().iloc[0]
# 1

df['col1'].sort_values().iat[0]
# 1
cs95
  • 379,657
  • 97
  • 704
  • 746