0

I have below two columns in this dataframe.

      DATE1                       DATE2

2020-07-08 23:54:17.0   2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0   2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0   2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0   2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0   2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0   2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0   2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0   2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0   2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0   2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0   2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0   2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0   2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0   2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0   2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0   2020-07-08 13:20:54.9810000

I would want to find the difference between these timestamps and add a boolean column which says if the difference between these two dates is greater than 24 hours or not.

I tried out the foloowing snippet but got the error: 'unsupported operand type(s) for -: 'str' and 'str''

df['diff_hours'] = df['DATE2'] - df['DATE1']
df['diff_hours']= df['diff_hours']/np.timedelta64(1,'h')

Can someone help me out with this snippet or have some other way around to tackle this easily? Thanks in advance!

arsenicFC
  • 17
  • 7
  • 1
    First try converting the columns to datetime object than try subtracting. – prax Jul 23 '20 at 05:22
  • https://stackoverflow.com/questions/22923775/calculate-pandas-dataframe-time-difference-between-two-columns-in-hours-and-minu – prax Jul 23 '20 at 05:26
  • Does this answer your question? [Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes](https://stackoverflow.com/questions/22923775/calculate-pandas-dataframe-time-difference-between-two-columns-in-hours-and-minu) – Trenton McKinney Aug 23 '20 at 01:13

2 Answers2

2

The sample data doesn't have the time difference greater than 24hrs

In [26]: df = pd.read_csv("a.csv", parse_dates=["DATE1","DATE2"])

In [27]: df
Out[27]:
                 DATE1                   DATE2
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981

In [28]: df["diff_hours"] = (df.DATE2-df.DATE1).astype('timedelta64[h]')

In [29]: df
Out[29]:
                 DATE1                   DATE2  diff_hours
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997        19.0
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363         5.0
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899         3.0
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690        22.0
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996         7.0
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039         3.0
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655        23.0
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625        20.0
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982         3.0
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375         3.0
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515         6.0
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971        24.0
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769        23.0
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006         2.0
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892         7.0
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981         7.0

In [30]: df["status"] = df["diff_hours"] > 24

In [31]: df
Out[31]:
                 DATE1                   DATE2  diff_hours  status
0  2020-07-08 23:54:17 2020-07-09 19:00:56.997        19.0   False
1  2020-07-08 08:22:28 2020-07-08 13:23:10.363         5.0   False
2  2020-07-08 10:24:25 2020-07-08 13:25:30.899         3.0   False
3  2020-07-08 20:19:35 2020-07-09 18:57:07.690        22.0   False
4  2020-07-08 06:07:45 2020-07-08 13:20:49.996         7.0   False
5  2020-07-08 10:20:25 2020-07-08 13:25:20.039         3.0   False
6  2020-07-08 19:18:23 2020-07-09 18:56:06.655        23.0   False
7  2020-07-08 22:12:03 2020-07-09 18:59:11.625        20.0   False
8  2020-07-08 09:38:44 2020-07-08 13:24:44.982         3.0   False
9  2020-07-08 09:54:44 2020-07-08 13:24:45.375         3.0   False
10 2020-07-08 06:23:45 2020-07-08 13:21:05.515         6.0   False
11 2020-07-08 18:49:17 2020-07-09 18:55:41.971        24.0   False
12 2020-07-08 19:47:23 2020-07-09 18:56:37.769        23.0   False
13 2020-07-08 10:48:25 2020-07-08 13:25:45.006         2.0   False
14 2020-07-08 05:30:45 2020-07-08 13:20:15.892         7.0   False
15 2020-07-08 06:09:45 2020-07-08 13:20:54.981         7.0   False
bigbounty
  • 16,526
  • 5
  • 37
  • 65
  • facing the error => TypeError: unsupported operand type(s) for -: 'str' and 'str' – arsenicFC Jul 23 '20 at 05:35
  • If you are reading from csv, then use the code in the answer, if you are reading as string, then use `pd.to_datetime(df["DATE1"])` and for `DATE2` and then run the above code – bigbounty Jul 23 '20 at 05:37
0

You want to make those columns the date datatype.

Try

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
DATE1                   DATE2
2020-07-08 23:54:17.0   2020-07-09 19:00:56.9970000
2020-07-08 08:22:28.0   2020-07-08 13:23:10.3630000
2020-07-08 10:24:25.0   2020-07-08 13:25:30.8990000
2020-07-08 20:19:35.0   2020-07-09 18:57:07.6900000
2020-07-08 06:07:45.0   2020-07-08 13:20:49.9960000
2020-07-08 10:20:25.0   2020-07-08 13:25:20.0390000
2020-07-08 19:18:23.0   2020-07-09 18:56:06.6550000
2020-07-08 22:12:03.0   2020-07-09 18:59:11.6250000
2020-07-08 09:38:44.0   2020-07-08 13:24:44.9820000
2020-07-08 09:54:44.0   2020-07-08 13:24:45.3750000
2020-07-08 06:23:45.0   2020-07-08 13:21:05.5150000
2020-07-08 18:49:17.0   2020-07-09 18:55:41.9710000
2020-07-08 19:47:23.0   2020-07-09 18:56:37.7690000
2020-07-08 10:48:25.0   2020-07-08 13:25:45.0060000
2020-07-08 05:30:45.0   2020-07-08 13:20:15.8920000
2020-07-08 06:09:45.0   2020-07-08 13:20:54.9810000
"""), sep='\s\s+')

df['ge24'] = pd.to_datetime(df.DATE2) - pd.to_datetime(df.DATE1) > '24 hours'
print(df)

Output

                    DATE1                        DATE2   ge24
0   2020-07-08 23:54:17.0  2020-07-09 19:00:56.9970000  False
1   2020-07-08 08:22:28.0  2020-07-08 13:23:10.3630000  False
2   2020-07-08 10:24:25.0  2020-07-08 13:25:30.8990000  False
3   2020-07-08 20:19:35.0  2020-07-09 18:57:07.6900000  False
4   2020-07-08 06:07:45.0  2020-07-08 13:20:49.9960000  False
5   2020-07-08 10:20:25.0  2020-07-08 13:25:20.0390000  False
6   2020-07-08 19:18:23.0  2020-07-09 18:56:06.6550000  False
7   2020-07-08 22:12:03.0  2020-07-09 18:59:11.6250000  False
8   2020-07-08 09:38:44.0  2020-07-08 13:24:44.9820000  False
9   2020-07-08 09:54:44.0  2020-07-08 13:24:45.3750000  False
10  2020-07-08 06:23:45.0  2020-07-08 13:21:05.5150000  False
11  2020-07-08 18:49:17.0  2020-07-09 18:55:41.9710000   True
12  2020-07-08 19:47:23.0  2020-07-09 18:56:37.7690000  False
13  2020-07-08 10:48:25.0  2020-07-08 13:25:45.0060000  False
14  2020-07-08 05:30:45.0  2020-07-08 13:20:15.8920000  False
15  2020-07-08 06:09:45.0  2020-07-08 13:20:54.9810000  False
Balaji Ambresh
  • 4,977
  • 2
  • 5
  • 17