1

I have a dataframe with an id column, and a date column made up of an integer.

d = {'id': [1, 2], 'date': [20161031, 20170930]}
df = pd.DataFrame(data=d)

   id      date
0   1  20161031
1   2  20170930

I can convert the date column to an actual date like so.

df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))

   id       date
0   1 2016-10-31
1   2 2017-09-30

But I need to have this field as a timestamp with with hours, minutes, and seconds so that it is compatible with my database table. I don't care what the the values are, we can keep it easy by setting it to zeros.

2016-10-31 00:00:00
2017-09-30 00:00:00

What is the best way to change this field to a timestamp? I tried

df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d%H%M%S'))

but pandas didn't like that.

I think I could append six 0's to the end of every value in that field and then use the above statement, but I was wondering if there is a better way.

Bill
  • 10,323
  • 10
  • 62
  • 85
sanjayr
  • 1,679
  • 2
  • 20
  • 41
  • use `pandas.to_datetime(df['date']) ` that is enough, already discussed at https://stackoverflow.com/questions/32204631/how-to-convert-string-to-datetime-format-in-pandas-python – Serge Aug 19 '20 at 18:43
  • format : string, default None strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. – gannu Aug 19 '20 at 18:47
  • @Serge are you sure it works in this case though? E.g. `pd.to_datetime(20161031) == pd.Timestamp('1970-01-01 00:00:00.020161031')` – Bill Aug 19 '20 at 19:33
  • You are right, it should be converted to string with `df['date'].apply(str)` – Serge Aug 19 '20 at 20:03

3 Answers3

3

With pandas it is simpler and faster to convert entire columns. First you convert to string and then to time stamp

pandas.to_datatime(df['date'].apply(str))

PS there are few other conversion methods of varying performance https://datatofish.com/fastest-way-to-convert-integers-to-strings-in-pandas-dataframe/

Serge
  • 3,387
  • 3
  • 16
  • 34
2

The problem seems to be that pd.to_datetime doesn't accept dates in this integer format:

pd.to_datetime(20161031) gives Timestamp('1970-01-01 00:00:00.020161031')

It assumes the integers are nanoseconds since 1970-01-01.

You have to convert to a string first:

df['date'] = pd.to_datetime(df["date"].astype(str))

Output:

   id       date
0   1 2016-10-31
1   2 2017-09-30

Note that these are datetimes so they include a time component (which are all zero in this case) even though they are not shown in the data frame representation above.

print(df.loc[0,'date'])

Out:

Timestamp('2016-10-31 00:00:00')
Bill
  • 10,323
  • 10
  • 62
  • 85
1

You can use

df['date'] = pd.to_datetime(df["date"].dt.strftime('%Y%m%d%H%M%S'))

Charles
  • 227
  • 1
  • 5