1

I've been looking through every thread that I can find, and the only one that is relevant to this type of formatting issue is here, but it's for java...

How parse 2013-03-13T20:59:31+0000 date string to Date

I've got a column with values like 201604 and 201605 that I need to turn into date values like 2016-04-01 and 2016-05-01. To accomplish this, I've done what is below.

#Create Number to build full date
df['DAY_NBR'] = '01'

#Convert Max and Min date to string to do date transformation
df['MAXDT'] = df['MAXDT'].astype(str)
df['MINDT'] = df['MINDT'].astype(str)

#Add the day number to the max date month and year
df['MAXDT'] = df['MAXDT'] + df['DAY_NBR']

#Add the day number to the min date month and year
df['MINDT'] = df['MINDT'] + df['DAY_NBR']

#Convert Max and Min date to integer values
df['MAXDT'] = df['MAXDT'].astype(int)
df['MINDT'] = df['MINDT'].astype(int)

#Convert Max date to datetime
df['MAXDT'] = pd.to_datetime(df['MAXDT'], format='%Y%m%d')

#Convert Min date to datetime
df['MINDT'] = pd.to_datetime(df['MINDT'], format='%Y%m%d') 

To be honest, I can work with this output, but it's a little messy because the unique values for the two columns are...

MAXDT Values
['2016-07-01T00:00:00.000000000' '2017-09-01T00:00:00.000000000'
 '2018-06-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
 '2017-03-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
 '2017-12-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
 '2018-09-01T00:00:00.000000000' '2018-10-01T00:00:00.000000000'
 '2016-04-01T00:00:00.000000000' '2018-03-01T00:00:00.000000000'
 '2017-05-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
 '2017-02-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
 '2018-01-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
 '2017-06-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
 '2018-05-01T00:00:00.000000000' '2019-11-01T00:00:00.000000000'
 '2016-06-01T00:00:00.000000000' '2017-10-01T00:00:00.000000000'
 '2016-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
 '2016-03-01T00:00:00.000000000' '2016-10-01T00:00:00.000000000'
 '2016-11-01T00:00:00.000000000' '2019-12-01T00:00:00.000000000'
 '2016-09-01T00:00:00.000000000' '2017-08-01T00:00:00.000000000'
 '2016-05-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
 '2017-11-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
 '2017-04-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
 '2016-02-01T00:00:00.000000000' '2019-02-01T00:00:00.000000000'
 '2019-07-01T00:00:00.000000000' '2019-10-01T00:00:00.000000000'
 '2019-09-01T00:00:00.000000000' '2019-03-01T00:00:00.000000000'
 '2019-05-01T00:00:00.000000000' '2019-04-01T00:00:00.000000000'
 '2019-08-01T00:00:00.000000000' '2019-06-01T00:00:00.000000000'
 '2020-02-01T00:00:00.000000000' '2020-01-01T00:00:00.000000000']
MINDT Values
['2016-04-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
 '2016-02-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
 '2017-02-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
 '2017-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
 '2017-10-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
 '2018-05-01T00:00:00.000000000' '2018-09-01T00:00:00.000000000'
 '2018-10-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
 '2016-03-01T00:00:00.000000000' '2017-11-01T00:00:00.000000000'
 '2017-05-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
 '2018-06-01T00:00:00.000000000' '2017-12-01T00:00:00.000000000'
 '2016-10-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
 '2017-06-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
 '2018-03-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
 '2016-08-01T00:00:00.000000000' '2016-06-01T00:00:00.000000000'
 '2018-01-01T00:00:00.000000000' '2016-07-01T00:00:00.000000000'
 '2016-11-01T00:00:00.000000000' '2016-09-01T00:00:00.000000000'
 '2017-04-01T00:00:00.000000000' '2016-05-01T00:00:00.000000000'
 '2017-09-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
 '2017-03-01T00:00:00.000000000']

I'm trying to build a loop that runs through these dates, and it works, but I don't want to have an index with all of these irrelevant zeros and a T in it. How can I convert these empty timestamp values to just the date that is in yyyy-mm-dd format?

Thank you!

Fish357
  • 87
  • 8

1 Answers1

1

Unfortunately, I believe Pandas always stores datetime objects as datetime64[ns], meaning the precision has to be like that. Even if you attempt to save as datetime64[D], it will be cast to datetime64[ns].

It's possible to just store these datetime objects as strings instead, but the simplest solution is likely to just strip the extra zeroes when you're looping through them (i.e, using df['MAXDT'].to_numpy().astype('datetime64[D]') and looping through the formatted numpy array), or just reformatting using datetime.