0

I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)

if I do df.dtype then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.

At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'.

I've tried multiple methods but have run into a problem every way I've tried. Any help would be appreciated. If I need to give any more information then I will.

M.Throw
  • 145
  • 1
  • 2
  • 11

3 Answers3

3

As suggested by @MaxU, you can use pd.to_datetime() method to bring the values of the given column to the 'appropriate' format, like this:

df['datetime'] = pd.to_datetime(df.datetime)

You would have to do this on whatever columns you have that you need trasformed to the right dtype.

Alternatively, you can use parse_dates argument of pd.read_csv() method, like this:

df = pd.read_csv(path, parse_dates=[1,2,3])

where columns 1,2,3 are expected to contain data that can be interpreted as dates.

I hope this helps.

Thanos
  • 2,472
  • 1
  • 16
  • 33
2

convert a column to datetime using this approach

df["Date"] = pd.to_datetime(df["Date"])

If column has empty values then change error level to coerce to ignore errors: Details

df["Date"] = pd.to_datetime(df["Date"], errors='coerce')

After which you should be able to subtract two dates.

example:

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

consult this answer for more details:

Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes

If you want to directly load the column as datetime object while reading from csv, consider this example :

Pandas read csv dateint columns to datetime

Community
  • 1
  • 1
Vikash Singh
  • 13,213
  • 8
  • 40
  • 70
  • This is the error I get `local variable 'mask' referenced before assignment` Curiously, I think it is to do with the fact that I have missing values. I've found that the line works on some columns but not on ones that have NaN values. – M.Throw Dec 11 '16 at 17:28
  • glad i could help :) @M.Throw – Vikash Singh Dec 11 '16 at 17:35
1

I found that the problem was to do with missing values within the column. Using coerce=True so df["Date"] = pd.to_datetime(df["Date"], coerce=True) solves the problem.

M.Throw
  • 145
  • 1
  • 2
  • 11