3

I'm reading in an excel file and unioning it to a csv file.

When I read in the excel file I have a field of dates:

0    2018-05-28 00:00:00
1    9999-12-31 00:00:00
2    2018-02-26 00:00:00
3    2018-02-26 00:00:00
4    2018-02-26 00:00:00
Name: Date_started, dtype: object

I check the data type

df['Date_started'].dtype
dtype('O')

Then when I write out the resultant dataframe to csv I get this:

df.to_csv(folderpath + "Date_Started_df.csv",encoding="UTF-8" , index=False, na_rep='',date_format='%d%m%Y')
Date_Started

28/05/2018 00:00
31/12/9999 00:00
26/02/2018 00:00
26/02/2018 00:00
26/02/2018 00:00

I have tried

df.loc[:,'Date_Started'] = df['Date_Started'].astype('str').str[8:10] + "/" + 
df['Date_Started'].astype('str').str[5:7] + "/" + 
df['Date_Started'].astype('str').str[:4] 

Which gave me:

0    28/05/2018
1    31/12/9999
2    26/02/2018
3    26/02/2018
4    26/02/2018
Name: Date_started, dtype: object

I thought it might be in the writing out:

df.to_csv(filename, date_format='%Y%m%d')

but I still got the times!?

REdim.Learning
  • 655
  • 2
  • 14
  • 32

1 Answers1

1

You need to convert your series to datetime before sending to CSV:

df['Date_Started'] = pd.to_datetime(df['Date_Started'])

This then allows Pandas to execute date_format='%d%m%Y' for the appropriate column with to_csv. The to_csv docs make this explicit:

date_format : string, default None

Format string for datetime objects

jpp
  • 159,742
  • 34
  • 281
  • 339
  • my problem is that I've got fake dates in the list (e.g. 31/12/9999) giving an error "ValueError: Given date string not likely a datetime." – REdim.Learning Jul 23 '18 at 16:35
  • Yeh, that's a problem, one tip is to convert to a non-possible date, e.g. `df['Date_Started'] = pd.to_datetime(df['Date_Started'], errors='coerce').fillna(pd.to_datetime('1950-01-01'))`. Works if you know 1st Jan 1950 is not a valid date. – jpp Jul 23 '18 at 16:36
  • would that leave blanks in the non-possible dates? As I actually need to be able to identify them as they are records without start dates – REdim.Learning Jul 24 '18 at 09:35
  • I was wondering if it was a writing issue as I managed to create "Strings", but times were added when it was written to csv – REdim.Learning Jul 24 '18 at 09:36
  • To leave blanks, use `df['Date_Started'] = pd.to_datetime(df['Date_Started'], errors='coerce')`. Blanks will be `NaT`, also known as "Not a Time". Try it and see what happens. – jpp Jul 24 '18 at 09:36
  • I managed to get NaT, when I write this out is there anyway to write it out as something other than blanks? Baring in mind that I've got other blanks in the data set? – REdim.Learning Jul 24 '18 at 14:37
  • @REdim.Learning, Have a look here: [Remove dtype datetime NaT](https://stackoverflow.com/questions/25141789/remove-dtype-datetime-nat) – jpp Jul 24 '18 at 14:39