0

I read some of the post related to this topic but nothing worked.

I am trying to convert to column of my dataframe called dem_inclusiondate and sae_hospit_date because I need to do a survival analysis, and need the duration between the inclusion date and the hospitalization

However, the type of these columns are Series and I can't find a way to convert them into date type.

I tried this following your comment

  baseline_all_patients["dem_inclusiondate"]
    .to_datetime(baseline_all_patients["dem_inclusiondate"], format="%Y-%m-%d")

but this error occurs: 'Series' object has no attribute 'to_datetime'

Sorry I am new, I don't know if my question is clear

Thank you for your help.

Paul Rumkin
  • 6,737
  • 2
  • 25
  • 35
  • https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples I would look into `df['col'] = pd.to_datetime(df['col'])` where df is your dataframe and col is your column name. – David Erickson Jul 14 '20 at 08:51
  • 1
    Try `df['dem_inclusiondate'] = pd.to_datetime(df['dem_inclusiondate']`), else edit your question according to the link above. – ScootCork Jul 14 '20 at 08:51
  • Your code differs from the code in the comments. Please check again. – above_c_level Jul 14 '20 at 10:06

1 Answers1

0

I believer this should help. Lets generate some data.

df = pd.DataFrame({'date_begin':['2020.6.7', '2020.5.3', '2020.1.1'],
                   'date_end':['2020.6.17', '2020.6.1', '2020.1.20']})

Then the syntax to convert stings in pandas is pretty easy. See more in Documentation

df['date_begin'] = pd.to_datetime(df['date_begin'], yearfirst=True)
df['date_end']   = pd.to_datetime(df['date_end'],   yearfirst=True)

Now timeDeltas are might give you some problems. That's because months and years have different lenghts. Depending on the accuracy you require, you might want to use Numpy (np) timedelta or pandas' own timedelta.

(df['date_end'] - df['date_begin']) / pd.Timedelta('1 days') 
(df['date_end'] - df['date_begin']) / np.timedelta64(1, 'D')
(df['date_end'] - df['date_begin']) / np.timedelta64(1, 'M')
(df['date_end'] - df['date_begin']) / np.timedelta64(1, 'Y')
pinegulf
  • 1,334
  • 13
  • 32
  • wow... you anticipated the problem I would occur with timedeltas... and that's true I got this problem but I could solve it thanks to you !!! Thank you so much !!! – Caroline Chong-Nguyen Jul 14 '20 at 19:49