0

Headache time.

I have a dataframe where an important column is of the type 'datetime64[ns]'. There are >1500000 rows, the datetime column contains about 3000 unique 'start times' (runs of a piece of equipment).

After performing some groupbys and data analysis I have ID'd ~ 30 runs of interest (identifiable by their run/start_time).analysis_df.start_time.unique() creates a numpy array of these datetimes.

I would like to use to pull out the raw data for these ~30 runs, and was hoping to use isin a la Filter dataframe rows if value in column is in a set list of values, however I have fallen foul of ISIN function does not work for dates and I'm a bit stuck.

I've looked into converting formats: Converting between datetime, Timestamp and datetime64 but the fact that Wes refers to this aspect as 'Hell' is doesn't fill me with hope! https://stackoverflow.com/a/13753918/2658194

Any help on how to achieve this 'filter' would be greatly appreciated - if anyone has a different (better!) approach i'd be happy to change up.

Thanks

BAC83
  • 811
  • 1
  • 12
  • 27

1 Answers1

2

ISIN function does not work for dates

That's not true. The question you have quoted is 5 years old. You can use a list, series or array. Here's a demonstration, tested on Pandas 0.19.2.

df = pd.DataFrame({'A': pd.to_datetime(['2018-01-01', '2018-11-30', '2018-12-24'])})

L = ['2018-01-01', '2018-12-24']
S = pd.to_datetime(L)
A = S.values

res1 = df[df['A'].isin(L)]
res2 = df[df['A'].isin(S)]
res3 = df[df['A'].isin(A)]

assert (res1.values == res2.values).all()
assert (res1.values == res3.values).all()

print(res1)

#            A
# 0 2018-01-01
# 2 2018-12-24
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    Awkwardly, yes this does work - I had been trying a work around, converting the array to a list, and using that, and getting errors galore. I've just left it as an array and it's worked. Thanks very much. – BAC83 Sep 25 '18 at 15:04