0

I have the following dataset: https://i.stack.imgur.com/YPAE1.jpg

I want to create a new column that is the subtraction between time_exit and time_entry. However, when I try the code:

df[['tempo']] = df['time_exit'] - df['time_entry']

The result is: TypeError: unsupported operand type(s) for -: 'str' and 'str'

If I do:

df[['tempo']] = df[['time_exit']] - df[['time_entry']]

The result is: ValueError: Columns must be same length as key.

But doing a describe on both, they have the SAME count, that is 381185.

I'm lost.

3 Answers3

1

Looking at the first error, your columns have the wrong datatype; you are trying to subtract a string from another. So, you should convert these columns:

df['time_exit'] = pd.to_datetime(df['time_exit'])
df['time_entry'] = pd.to_datetime(df['time_entry'])

then,

df['tempo'] = df['time_exit'] - df['time_entry']

should do the trick.

Your second approach fails, because df[['time_exit']] and df[['time_entry']] return DataFrames, rather than a Series.

Subtracting two DataFrames with one column each (and these columns have different names), returns a third dataframe, with two columns, filled with nan, which cannot be assigned to a single column.

warped
  • 8,947
  • 3
  • 22
  • 49
  • I already when through this. Even adding the errors = coerce and format option arguments, the result is that every observation in the tempo column (which is created at least) is NaT. – Lucas Carvalho May 05 '19 at 00:48
  • alright, could you then post the head of the actual data as text, rather than image? check out [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for making good pandas examples. – warped May 05 '19 at 00:51
0

Use apply with Timedelta:

#sample data
df = pd.DataFrame({'start': ['07:15:00', '08:00:00'], 'end':['08:15:00', '10:00:00']})

# apply with pd.Timedelta
df['diff'] = df['end'].apply(pd.Timedelta) - df['start'].apply(pd.Timedelta) 

      start       end     diff
0  07:15:00  08:15:00 01:00:00
1  08:00:00  10:00:00 02:00:00
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
0

I would recommend specifying the format of the current time data first,

df['time_exit'] = pd.to_datetime(df['time_exit'] , errors='coerce', format='%d/%m/%Y %H:%M:%S', infer_datetime_format=True)
df['time_entry'] = pd.to_datetime(df['time_entry'] , errors='coerce', format='%d/%m/%Y %H:%M:%S', infer_datetime_format=True)

and after this :

df[['tempo']] = df['time_exit'] - df['time_entry']

If you need only the count of difference of days :

df[['tempo']] = (df['time_exit'] - df['time_entry']).dt.days
Abhinav Kumar
  • 177
  • 2
  • 5
  • 22