0

I have a list dates with dates as string objects with all dates from 2003-01-01 to 2017-06-30:

['2003-01-01', '2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05', '2003-01-06', '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10', '2003-01-11', '2003-01-12', '2003-01-13', '2003-01-14', '2003-01-15', '2003-01-16', '2003-01-17', '2003-01-18', '2003-01-19', '2003-01-20', '2003-01-21',...]

And I have a dataframe with date time objects:

0       2013-01-09 09:07:49
1       2013-01-09 09:16:25
2       2013-01-09 11:14:28
3       2013-01-09 11:25:51
4       2013-01-09 11:25:51
5       2013-01-09 11:33:35
6       2013-01-09 11:35:31
7       2013-01-09 18:11:03
8       2013-03-13 21:04:58
9       2013-03-13 21:05:57
10      2013-03-15 14:07:27
11      2013-03-26 21:53:35
12      2013-03-26 22:19:20
13      2013-04-09 14:21:48
14      2013-04-09 14:22:29
15      2013-04-09 14:22:45
16      2013-04-22 12:10:47
...

Then I want to check if the dates in the list dates exist in the dataframe. So I want to do something like this:

df = pd.read_csv("blabla.csv")
df['time'] = pd.to_datetime(df['time'])
for j in dates:
  if j in df['time']:
       return(yes)

How can I compare a string object with a datetime object in a dataframe?

This is my full code:

dates=[]       
start = date(2003, 1, 1)
end = date(2017, 6, 30)

delta = end - start

for i in range(delta.days + 1):
    newdate = start + timedelta(days=i)
    dates.append(newdate.strftime("%Y-%m-%d"))

df = pd.read_csv("blabla.csv",parse_dates=True)
df['time'] = pd.to_datetime(df['time'])
Melli
  • 1
  • 6
  • https://stackoverflow.com/questions/466345/converting-string-into-datetime, then just do your comparison – Collin Phillips May 29 '19 at 15:38
  • Convert the strings to dates first. See https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior – Varun Balupuri May 29 '19 at 15:38
  • Did you check my solution, its fairly simple. When using pandas, its disencouraged to use `for loops` and `if statements` because they are quite slow and can get unreadable. Pandas had lot of methods for most of the problems. `isin` simply checks if the values in your column are present in your list, hence the name _isin_ – Erfan May 29 '19 at 19:37

2 Answers2

0

You can use parse_dates in your call to read_csv

df = pd.read_csv("blabla.csv",parse_dates=True)

This will result in datetime objects instead of strings. Then you could use the .isin method to see if the elements of one column are in the other.

df['time'].isin(dates)

This will return a series of True and False for each value in df['time']

iamchoosinganame
  • 1,090
  • 6
  • 15
  • When I try this my object type is: 'pandas.core.series.Series' is this correct? – Melli May 29 '19 at 15:44
  • I made an edit to show how to use isin. Every column of a dataframe is a pandas series. Try ```df['time'].dtype``` to see the type of the values. – iamchoosinganame May 29 '19 at 15:46
  • Did you test your proposed solution? Because this is incorrect. Date column has time as well, in the list there are dates, so it will never find matches. – Erfan May 29 '19 at 19:32
0

First we convert your time column to datetime, so we can acces solely the dates with Series.dt.dates. When we extracted the date from your datetime we convert it to string so we can compare it to your list.

Finally we use the isin method to create your new column as an indicator.

df['Time'] = pd.to_datetime(df['Time'])

df['Indicator'] = df['Time'].dt.date.astype(str).isin(dates)

                  Time  Indicator
0  2013-01-09 09:07:49      False
1  2013-01-09 09:16:25      False
2  2013-01-09 11:14:28      False
3  2013-01-09 11:25:51      False
4  2013-01-09 11:25:51      False
5  2013-01-09 11:33:35      False
6  2013-01-09 11:35:31      False
7  2013-01-09 18:11:03      False
8  2013-03-13 21:04:58      False
9  2013-03-13 21:05:57      False
10 2013-03-15 14:07:27      False
11 2013-03-26 21:53:35      False
12 2013-03-26 22:19:20      False
13 2013-04-09 14:21:48      False
14 2013-04-09 14:22:29      False
15 2013-04-09 14:22:45      False
16 2013-04-22 12:10:47      False

Note: your list is not a very good example, since none of the dates exist in your dataframe, that's why it returns all False.

If I use a list which has some date of your dataframe, it will look like the following:

dates = ['2013-01-09', '2013-02-09', '2013-03-26']
df['Indicator'] = df['Time'].dt.date.astype(str).isin(dates)

                  Time  Indicator
0  2013-01-09 09:07:49       True
1  2013-01-09 09:16:25       True
2  2013-01-09 11:14:28       True
3  2013-01-09 11:25:51       True
4  2013-01-09 11:25:51       True
5  2013-01-09 11:33:35       True
6  2013-01-09 11:35:31       True
7  2013-01-09 18:11:03       True
8  2013-03-13 21:04:58      False
9  2013-03-13 21:05:57      False
10 2013-03-15 14:07:27      False
11 2013-03-26 21:53:35       True
12 2013-03-26 22:19:20       True
13 2013-04-09 14:21:48      False
14 2013-04-09 14:22:29      False
15 2013-04-09 14:22:45      False
16 2013-04-22 12:10:47      False

Extensive information about isin: link

Erfan
  • 40,971
  • 8
  • 66
  • 78