0

I have a dataframe with a date column that is represented as ndarray object. I want to clusterize dataframe and I decided to convert each string (like '2009-01-04') in this column to int. How can I represent it as a recency in days considering the last date as today's date? I got some problems with "to_datetime" method.

index col1 col2  date
 0     a    34  '2009-01-04'
 1     a    21  '2009-01-05'
 2     b    8   '2009-01-06'

Thank you

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
user12628549
  • 152
  • 1
  • 12
  • Associated question [Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes](https://stackoverflow.com/questions/22923775) – Trenton McKinney Sep 25 '20 at 05:53

1 Answers1

1

I believe you need convert dates to datetimes and then subtract from right side by Series.rsub today dates, last convert output timedeltas to days by Series.dt.days:

df['new'] = pd.to_datetime(df['date']).rsub(pd.Timestamp('now').floor('d')).dt.days
print (df)
  col1  col2          date   new
0    a    34  '2009-01-04'  4282
1    a    21  '2009-01-05'  4281
2    b     8  '2009-01-06'  4280
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252