0

I have data of the form

>>> df['image-capture_time'].iloc[-20:]
43    2022-07-19 20:08:26.603000+00:00
36    2022-07-19 20:08:28.313000+00:00
35    2022-07-19 20:08:29.571000+00:00
40    2022-07-19 20:08:30.796000+00:00
38    2022-07-19 20:08:32.062000+00:00
39    2022-07-19 20:08:33.346000+00:00
42    2022-07-19 20:08:34.579000+00:00
41    2022-07-19 20:08:35.813000+00:00
34    2022-07-19 20:08:37.062000+00:00
37    2022-07-19 20:08:38.314000+00:00
130   2022-07-22 15:12:05.925000+00:00
127   2022-07-22 15:12:07.531000+00:00
122   2022-07-22 15:12:08.765000+00:00
123   2022-07-22 15:12:10.031000+00:00
124   2022-07-22 15:12:11.298000+00:00
129   2022-07-22 15:12:12.548000+00:00
128   2022-07-22 15:12:13.781000+00:00
125   2022-07-22 15:12:15.032000+00:00
121   2022-07-22 15:12:16.298000+00:00
126   2022-07-22 15:12:17.532000+00:00
Name: image-capture_time, dtype: datetime64[ns, UTC]

where the values have been correctly sorted by increasing pandas.Timestamp. But using

iloc[df['image-capture_time'].idxmax()]

does not return the record with the maximum time:

>>> df['image-capture_time'].iloc[df['image-capture_time'].idxmax()]
Timestamp('2022-07-22 15:12:11.298000+0000', tz='UTC')
>>> df['image-capture_time'].iloc[-1]
Timestamp('2022-07-22 15:12:17.532000+0000', tz='UTC')
>>> df['image-capture_time'].idxmax()
126
>>> df['image-capture_time'].iloc[131]
Timestamp('2022-07-22 15:12:17.532000+0000', tz='UTC')

What's going on here? Clearly there's something I don't understand about [iloc][1], idxmax, or both (or maybe even [pandas.Timestamp][3]).

orome
  • 45,163
  • 57
  • 202
  • 418
  • Unless the index is a range from 0 to len(df), use `loc` to index by label – mozway Jul 22 '22 at 16:25
  • @mozway So in Pandas "index" and "label" are synonyms (e.g. as the argument to `pd.DataFrame`, `index` means label), while the "i" in `iloc`stands for "integer" not "index"? My mistake was thinking that the "i" stood for "index" and that indexing and labeling were distinct. What's the logic of conflating the terms in the Pandas API. Is that reasoning explained somewhere? – orome Jul 29 '22 at 21:33
  • There used to be a `ix` locator that could perform both positional and label indexing, but it's been deprecated as it was ambiguous. Index is the name of the structure (the columns are also an Index), label/indice refers to the name of the items (but I also use index in the sense of label too often). – mozway Jul 29 '22 at 21:39
  • @mozway So here's what needs some explaining IMV: the "i" in `iloc` and the "idx" in `idxmax` don't refer to the same thing. The former refers to "position" or "sequence" (neither of which begin with "i") while the latter refers to "label" (which shares no letters with "idx", but is sometimes but not always synonymous with "index"). Why do that? – orome Jul 29 '22 at 21:58
  • I think you're overthinking it. [The doc uses "*integer-location*" for `iloc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html). You should probably try to keep this as a mnemonic and move on ;) – mozway Jul 30 '22 at 00:37
  • @mozway Yes I should certainly move on, but this just isn't up to the standards of Python APIs generally. A fix could take a lot of forms (e.g., `loc` could be `idxloc`) but something is need imv. – orome Jul 30 '22 at 16:00

1 Answers1

1

Use .loc, not .iloc. The latter will slice by position; the former, by index (which is what you want).

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • So the "i" in `iloc` [stands for "position"](https://stackoverflow.com/questions/73083316/why-does-pandas-idxmax-fail-to-find-maximum-timestamp#comment129228395_73083316)? – orome Jul 29 '22 at 21:34
  • @orome iloc means integer location. – rafaelc Jul 30 '22 at 14:15