2

I have a dataframe of dates:

>>> d.head()
Out[55]: 
0   2010-06-01
1   2010-06-02
2   2010-06-03
3   2010-06-04
4   2010-06-07
dtype: datetime64[ns]

I am not able to check whether a given date in contained in it:

>>> d.iloc[1]
Out[59]: Timestamp('2010-06-02 00:00:00')

>>> d.iloc[1] in d
Out[60]: False

>>> np.datetime64(d.iloc[1]) in d
Out[61]: False

>>> d.iloc[1] in pd.to_datetime(d)
Out[62]: False

>>> pd.to_datetime(d.iloc[1]) in pd.to_datetime(d)
Out[63]: False

what's the best to check this?

to answer some of the comments below:

Using values doesnt solve it:

>>> d.iloc[1] in d.values
Out[69]: False

I dont think it is a matter of iloc returning row not value

>>> x= pd.Timestamp('2010-6-2')
>>> x
Out[72]: Timestamp('2010-06-02 00:00:00')
>>> x in d
Out[73]: False
>>> x in pd.to_datetime(d)
Out[74]: False
>>> x in d.values
Out[75]: False
dayum
  • 1,073
  • 15
  • 31

3 Answers3

1

Try this. You are comparing the first value of a pd.Series against the values in the column, which of course will be True.

The reason I believe your comparison does not work is because the in operator acting on pd.Series checks for existence in the series index, not the series values itself. Applying set ensures that the series values are used fo the comparison.

# df
#     date
# 0   2010-06-01
# 1   2010-06-02
# 2   2010-06-03
# 3   2010-06-04
# 4   2010-06-07

# convert date column to datetime
df.date = pd.to_datetime(df.date)

df.date[1] in set(df.date)
jpp
  • 159,742
  • 34
  • 281
  • 339
  • @ jp_data_analysis sorry didnt seem to help - i included my results in the edit above – dayum Feb 01 '18 at 01:21
  • @dayum have you tried applying `pd.to_datetime` to the value you are checking? this was what was advised in the link @GarbaceCollector provided. – jpp Feb 01 '18 at 01:25
  • @ jp_data_analysis yup thats been reported above – dayum Feb 01 '18 at 01:25
  • @ jp_data_analysis yes that worked. I just posted a detailed answer as to why that seemed to work, and another solution to this – dayum Feb 01 '18 at 01:33
1

Here's one possible answer i got on trial and error, not sure if I am missing something.

Checking d shows that it is a dtype datetime64[ns]

>>> d.head()
Out[55]: 
0   2010-06-01
1   2010-06-02
2   2010-06-03
3   2010-06-04
4   2010-06-07
dtype: datetime64[ns]

Same happens on d.values

>>> d.values
Out[76]: 
array(['2010-05-31T20:00:00.000000000-0400', '2010-06-01T20:00:00.000000000-0400',.....], dtype='datetime64[ns]')

But checking only one of them changes it to timestamp.

>>> d.iloc[1]
Out[82]: Timestamp('2010-06-02 00:00:00')

So i did this which worked:

>>> x= pd.Timestamp('2010-6-2')
>>> x
Out[72]: Timestamp('2010-06-02 00:00:00')
>>> np.datetime64(x) in d.values
Out[77]: True

Checking @jp_data_analysis suggestion of using set also worked as it keeps the format to Timestamp

>>> set(d.iloc[:])
Out[81]: 
{Timestamp('2015-10-13 00:00:00'),
 Timestamp('2011-07-18 00:00:00'),......

>>> x in set(d.iloc[:])
Out[83]: True
dayum
  • 1,073
  • 15
  • 31
-1

You can do the following, with .isin (note that .isin does require a list as input):

df.date = pd.to_datetime(df.date)

df.date.isin([df.date.iloc[1]])
paul_dg
  • 511
  • 5
  • 16