2

I have a Python data frame containing a column with Date Time like this 2019-01-02 09:00:00 (which means January 2, 2019 9 AM)

There may be a bunch of rows which have the same date in the Date Time column.

In other words, I can have 2019-01-02 09:00:00 or 2019-01-02 09:15:00 or 2019-01-02 09:30:00 and so on.

Now I need to find the row index of the first occurrence of the date 2019-01-02 in the Python data frame.

I obviously do this using a loop, but am wondering if there is a better way.

With the df['Date Time'].str.contains() method, I can get that all the rows that match a given date, but I need the index.

The generic question is that how do we find the index of a first occurrence of a match in a cell in Python data frame that matches a given string pattern.

The more specific question is that how do we find the index of a first occurrence of a match in a cell in Python data frame that matches a given date in a cell that contains date Time assuming that the Python data frame is sorted in chronologically ascending order of date Time , i.e. 2019-01-02 09:00:00 occurs at an index earlier than 2019-01-02 09:15:00 followed by 2019-01-03 09:00:00 and so on.

Thank you for any inputs

Ramana
  • 243
  • 4
  • 15
  • 1
    can you not just call the `index` function?: `df[df['Date Time'].dt.date == pd.Timestamp('2019-01-02').date()].head(1).index` – It_is_Chris Jan 18 '19 at 14:36
  • Better , you can try creating a DataFrame sample at least and try with that doesn't matter if it works or not in order to show us the data so, you can get appropriate answer for your requirement , text details doesn't create a good understanding. – Karn Kumar Jan 18 '19 at 14:40

3 Answers3

3

You can use next with iter for first index value matched condition for prevent failed if no matched values:

df = pd.DataFrame({'dates':pd.date_range(start='2018-01-01 20:00:00',
                                         end='2018-01-02 02:00:00', freq='H')})
print (df)
                dates
0 2018-01-01 20:00:00
1 2018-01-01 21:00:00
2 2018-01-01 22:00:00
3 2018-01-01 23:00:00
4 2018-01-02 00:00:00
5 2018-01-02 01:00:00
6 2018-01-02 02:00:00

date = '2018-01-02'
mask = df['dates'] >= date
idx = next(iter(mask.index[mask]), 'not exist')
print (idx)
4


date = '2018-01-08'
mask = df['dates'] >= date
idx = next(iter(mask.index[mask]), 'not exist')
print (idx)
not exist

If performance is important, see Efficiently return the index of the first value satisfying condition in array.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Yep you can use .loc and a condition to slice the df, and then return the index using .iloc.

import pandas as pd
df = pd.DataFrame({'time':pd.date_range(start='2018-01-01 00:00:00',end='2018-12-31 00:00:00', freq='H')}, index=None).reset_index(drop=True)

# then use conditions and .iloc to get the first instance
df.loc[df['time']>'2018-10-30 01:00:00'].iloc[[0,]].index[0]

# if you specify a coarser condition, for instance without time,
# it will also return the first instance
df.loc[df['time']>'2018-10-30'].iloc[[0,]].index[0]
BenP
  • 825
  • 1
  • 10
  • 30
0

I do not know, if it is optimal, but it works

(df['Date Time'].dt.strftime('%Y-%m-%d') == '2019-01-02').idxmax()
cors
  • 527
  • 4
  • 11