0

I have tried many suggestions from here but none of them solved. I have two columns with observations like this: 15:08:19

If I write

df.time_entry.describe() 

it appears:

count       814262
unique       56765
top       15:03:00
freq           103
Name: time_entry, dtype: object

I've already run this code:

df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S', errors='ignore' ).dt.time

But rerunning the describe code still returns dtype: object.

edesz
  • 11,756
  • 22
  • 75
  • 123

2 Answers2

1

What is the purpose of dt.time?

Just remove dt.time and your conversion from object to datetime will work perfectly fine.

df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S')
  • Thank you for the comment. Doing what you suggest leads to the transformation of dtype to int64, but all the values of the columns are turned to Nat. – Lucas Carvalho May 05 '19 at 00:50
  • 1
    Can you show a few samples from your dataset on which i can work on? – Suyash Sreekumar May 05 '19 at 00:57
  • This is the head(5) of the dataset with no conversion: https://i.imgur.com/Ufk7Tvz.jpg – Lucas Carvalho May 05 '19 at 01:35
  • I restarted the kernel of the notebook and now the output of the conversion you suggested changed (this is beyond science to me): https://i.imgur.com/YGv8V8u.jpg – Lucas Carvalho May 05 '19 at 01:37
  • Suyash, I copy here the answer a gave to edesz above: Thanks, It worked. I was able to create a column (tempo) with the differences between them. The weird thing though is that doing a df['time_exit'].describe() shows the the dtype is object while doing print(df.dtypes) shows that dtype is datetime64[ns]. Any explanation for this? – Lucas Carvalho May 05 '19 at 01:53
  • It's dumb, I know, but I think I was fooled by the describe() result. – Lucas Carvalho May 05 '19 at 01:54
  • @LucasCarvalho i looked into your query, I found this link to be the closest to your doubt . Hope it helps https://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object – Suyash Sreekumar May 05 '19 at 03:52
0

The problem is that you are using the datetime accessor (.dt) with the property time and then you are not able to subtract the two columns from eachother. So, just leave out .dt.time and it should work.

Here is some data with 2 columns of strings

df = pd.DataFrame()
df['time_entry'] = ['12:01:00', '15:03:00', '16:43:00', '14:11:00']
df['time_entry2'] = ['13:03:00', '14:04:00', '19:23:00', '18:12:00']

print(df)
  time_entry time_entry2
0   12:01:00    13:03:00
1   15:03:00    14:04:00
2   16:43:00    19:23:00
3   14:11:00    18:12:00

Convert both columns to datetime dtype

df['time_entry'] = pd.to_datetime(df['time_entry'], format= '%H:%M:%S', errors='ignore')
df['time_entry2'] = pd.to_datetime(df['time_entry2'], format= '%H:%M:%S', errors='ignore')

print(df)
           time_entry         time_entry2
0 1900-01-01 12:01:00 1900-01-01 13:03:00
1 1900-01-01 15:03:00 1900-01-01 14:04:00
2 1900-01-01 16:43:00 1900-01-01 19:23:00
3 1900-01-01 14:11:00 1900-01-01 18:12:00

print(df.dtypes)
time_entry     datetime64[ns]
time_entry2    datetime64[ns]
dtype: object

(Optional) Specify timezone

df['time_entry'] = df['time_entry'].dt.tz_localize('US/Central')
df['time_entry2'] = df['time_entry2'].dt.tz_localize('US/Central')

Now perform the time difference (subtraction) between the 2 columns and get the time difference in number of days (as a float)

df['Diff_days1'] = (df['time_entry'] - df['time_entry2']).dt.total_seconds()/60/60/24
df['Diff_days2'] = (df['time_entry'] - df['time_entry2']) / np.timedelta64(1, 'D')
df['Diff_days3'] = (df['time_entry'].sub(df['time_entry2'])).dt.total_seconds()/60/60/24

print(df)
           time_entry         time_entry2  Diff_days1  Diff_days2  Diff_days3
0 1900-01-01 12:01:00 1900-01-01 13:03:00   -0.043056   -0.043056   -0.043056
1 1900-01-01 15:03:00 1900-01-01 14:04:00    0.040972    0.040972    0.040972
2 1900-01-01 16:43:00 1900-01-01 19:23:00   -0.111111   -0.111111   -0.111111
3 1900-01-01 14:11:00 1900-01-01 18:12:00   -0.167361   -0.167361   -0.167361

EDIT

If you're trying to access datetime attributes, then you can do so by using the time_entry column directly (not the time difference column). Here's an example

df['day1'] = df['time_entry'].dt.day
df['time1'] = df['time_entry'].dt.time
df['minute1'] = df['time_entry'].dt.minute
df['dayofweek1'] = df['time_entry'].dt.weekday
df['day2'] = df['time_entry2'].dt.day
df['time2'] = df['time_entry2'].dt.time
df['minute2'] = df['time_entry2'].dt.minute
df['dayofweek2'] = df['time_entry2'].dt.weekday

print(df[['day1', 'time1', 'minute1', 'dayofweek1',
        'day2', 'time2', 'minute2', 'dayofweek2']])
   day1     time1  minute1  dayofweek1  day2     time2  minute2  dayofweek2
0     1  12:01:00        1           0     1  13:03:00        3           0
1     1  15:03:00        3           0     1  14:04:00        4           0
2     1  16:43:00       43           0     1  19:23:00       23           0
3     1  14:11:00       11           0     1  18:12:00       12           0
edesz
  • 11,756
  • 22
  • 75
  • 123
  • Thanks, It worked. I was able to create a column (tempo) with the differences between them. The weird thing though is that doing a df['time_exit'].describe() shows the the dtype is object while doing print(df.dtypes) shows that dtype is datetime64[ns]. Any explanation for this? – Lucas Carvalho May 05 '19 at 01:43
  • IIUC, `df['time_exit'].describe()` is a Pandas `Series` with mixed datatypes and so it appears as ab `object`. If you are concerned about the `dtype` of a column, then you must look at the output of `df[col].dtype` – edesz May 05 '19 at 01:55