8

I have a dataframe like this:

      category  date            number
0      Cat1     2010-03-01      1
1      Cat2     2010-09-01      1
2      Cat3     2010-10-01      1
3      Cat4     2010-12-01      1
4      Cat5     2012-04-01      1
5      Cat2     2013-02-01      1
6      Cat3     2013-07-01      1
7      Cat4     2013-11-01      2
8      Cat5     2014-11-01      5
9      Cat2     2015-01-01      1
10     Cat3     2015-03-01      1

I would like to check if a date is exist in this dataframe but I am unable to. I tried various ways as below but still no use:

if pandas.Timestamp("2010-03-01 00:00:00", tz=None) in df['date'].values:
    print 'date exist'

if datetime.strptime('2010-03-01', '%Y-%m-%d') in df['date'].values:
    print 'date exist'

if '2010-03-01' in df['date'].values:
    print 'date exist'  

The 'date exist' never got printed. How could I check if the date exist? Because I want to insert the none-existed date with number equals 0 to all the categories so that I could plot a continuously line chart (one category per line). Help is appreciated. Thanks in advance.

The last one gives me this: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison And the date exist not get printed.

Leo
  • 265
  • 1
  • 4
  • 18
  • 2
    What is the `dtype` of `df['date']` as it matters, you can also just do `(df['date'] == yourdate).any()` – EdChum Oct 06 '16 at 10:21

3 Answers3

6

I think you need convert to datetime first by to_datetime and then if need select all rows use boolean indexing:

df.date = pd.to_datetime(df.date)

print (df.date == pd.Timestamp("2010-03-01 00:00:00"))
0      True
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
Name: date, dtype: bool

print (df[df.date == pd.Timestamp("2010-03-01 00:00:00")])
  category       date  number
0     Cat1 2010-03-01       1

For return True use check value converted to numpy array by values:

if ('2010-03-01' in df['date'].values):
    print ('date exist')

Or at least one True by any as comment Edchum:

if (df.date == pd.Timestamp("2010-03-01 00:00:00")).any():
    print ('date exist')  
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Curious that the last one doesn't work. Unless it's an issue with extra spaces or something. As an aside, do you know why it is necessary to use `.values` on a `Series` when using `in`? Ambiguity between values and index? – juanpa.arrivillaga Oct 06 '16 at 10:26
  • @juanpa.arrivillaga - Obviously `values` can be omited, I dont know how it works with datetimes. – jezrael Oct 06 '16 at 10:28
  • Well, in my interpreter shell if I use `'2010-03-01' in df['date']` I am getting `False`, but if I use `'2010-03-01' in df['date'].values` I get true. – juanpa.arrivillaga Oct 06 '16 at 10:29
  • `df.date = pd.to_datetime(df.date) if pd.Timestamp("2010-03-01 00:00:00") in df['date'].values: print 'date exist'` this one still not work. I could not make it to print out the 'date exist'. I need to get in the if cause because I need to add data to dataframe if that date is not existed – Leo Oct 06 '16 at 10:30
  • Interesting, poking around in the internals, the `__contains__` method for `pandas/core/generic.py` is simply `return key in self._info_axis`. So it checks for the index! – juanpa.arrivillaga Oct 06 '16 at 10:34
  • @juanpa.arrivillaga - it is `numpy` representation of `datetime`, so it works, but I cannot explain it, sorry. – jezrael Oct 06 '16 at 10:36
  • @jezrael huh? My point was if you have a `Series` with a value `x`, you must use `x in my_series.values` to get `True`. Or else it checks the `index`. This was surprising to me, but I guess it makes sense. – juanpa.arrivillaga Oct 06 '16 at 10:41
  • @juanpa.arrivillaga - it seems with datetimes it is different, but I am not sure. – jezrael Oct 06 '16 at 10:43
  • @juanpa.arrivillaga - but maybe help [this](http://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64). – jezrael Oct 06 '16 at 10:44
  • My issue wasn't really with datetimes, since I was dealing with strings anyway. I just learned today that pandas data structures have a `__contains___` method that checks for membership in the index rather than the values. It makes sense now that I think about it, and I guess I never noticed it because I always use `==` to check for membership. – juanpa.arrivillaga Oct 06 '16 at 10:49
1
import pandas as pd
# Create some dates:
data = {'date': ['2021-11-16', '2021-11-17', '2021-11-18']}
df = pd.DataFrame(data)
# Convert dates in dataframe to pandas timestamps:
dates = pd.to_datetime(df['date'])
pd.Timestamp('2021-11-17') in dates.tolist()  # returns True
pd.Timestamp('2021-11-15') in dates.tolist()  # returns False
mjkrause
  • 406
  • 1
  • 7
  • 14
0

For example, to cofirm that the 4th value of ds is contained within itself:

len(set(ds.isin([ds.iloc[3]]))) > 1

Let ds be a Pandas DataSeries of the form [index, pandas._libs.tslib.Timestamp] with example values:

0 2018-01-31 19:08:27.465515 1 2018-02-01 19:08:27.465515 2 2018-02-02 19:08:27.465515 3 2018-02-03 19:08:27.465515 4 2018-02-04 19:08:27.465515

Then, we use the isin local method to get a DataSeries of booleans where each entry indicates wether that position in ds matches with the value passed as argument to the function (since isin expects a list of values we need to provide the value in list format).

Next, we use the set global method as to get a set with 1 or 2 values depending on wether there was a match (True and False values) or not (only a False value).

Finally, we check if the set contains more than 1 value, if that is the case, it means we have a match, and no match otherwise.

JoseGzz
  • 25
  • 5