3

I'm trying to use pandas.Index.get_loc to return the index (as an int) of the nearest value, but occasionally it returns a slice object instead. According to the documentation,

get_loc returns int if unique index, slice if monotonic index, else mask.

But it doesn't look like the behavior is consistent. For example, with the following index:

idx = pd.DatetimeIndex(['2019-12-24 12:04:54',
                        '2019-12-26 20:09:22',
                        '2020-12-27 07:44:35'])

Using idx.get_loc('2019-12-27', method='ffill') returns slice(2, 2, None), whereas idx.get_loc('2019-12-29', method='ffill') returns 2. Changing the method from 'ffill' to 'bfill' doesn't seem to change the result.

My aim is to slice all points from the beginning of the index like idx[:i] where i is an int returned by get_loc. Another solution might to modify the beginning of the slice object, if that is possible.

Edit: Apparently, a slice is a built-in object with read-only data attributes start, stop and step (see docs here). This means you can check whether the result of get_loc is an int and if not, use idx[:slice.stop] to get all elements up to the desired index.

I'm still interested in the original question though.

Lawrence
  • 869
  • 7
  • 10

1 Answers1

4

Let's start with basic definitions.

Unique Index

A unique index is an index that contains non-duplicate labels. In such an index there cannot be two or more identical labels.

To check if a given index is unique, one can use pd.Index.is_unique attribute, e.g.:

>>> pd.Index(['s', 'a', 'm']).is_unique
True
>>> pd.Index(['s', 'a', 'm', 'a']).is_unique
False

As the documentation mentions, an example of such an index would be pd.Index(list('abc')), containing three unique labels a, b and c, and which happens to be monotonic as well. A unique non-monotonic index could be, for example, pd.Index(list('acb')), which breaks the forward order at the backward move from c to b.

Monotonic Index

Monotonicity is a mathematical property that indicates a given function maintains a non-increasing or non-decreasing order throughout its domain. In pandas, a monotonic index is an index that follows this property.

Similarly to uniqueness, you can check the monotonicity of an index with an attribute, pd.Index.is_monotonic or its derivatives, i.e. pd.Index.is_monotonic_increasing and pd.Index.is_monotonic_decreasing.

In this case, the documentation presents another example: pd.Index(list('abbc')), which is a non-unique monotonic index with a duplicated label b. A non-unique non-monotonic index, pd.Index(list('abcb')) is also mentioned. The duplicated label is again b while the order is broken at c -> b, which is against the previously established order a -> b -> c.


pd.Index.get_loc

This pandas Index method uses the aforedefined concepts to determine its return value. Its expected behaviour is specified as follows. If an index is unique, then it is supposed to return an int index value. If it is not unique, the method considers the monotonicity of the index. If it happens to be monotonous, it returns a slice. Otherwise, it returns a mask.

Your sample index, idx is unique (and monotonous, albeit it is irrelevant), therefore you would expect, get_loc should return an int. However, this is guaranteed only for exact label matches. This is not true for partial matches, like the ones you are using. I am leaving you with the output that shows the difference in usage:

>>> idx.get_loc('2019-12-24')
slice(0, 1, None)
>>> idx.get_loc('2019-12-24 12:04:54')
0
Arn
  • 1,898
  • 12
  • 26