1

I'm sure that this question has been asked before, but I can't seem to find a already existing question that has the same requirements as mine.

I have a CSV file with 3 columns, DATE, TOTAL and NAME.

           date   Total              name
0    2008-11-21      32     JOHN SMITH
1      08-25-21     7.9     JOHN SMITH
2      08-25-21     7.4     JOHN SMITH
3      08-27-21     100     JOHN SMITH
4    2008-04-21   36.62     JOHN SMITH

As you can see, there are 3 DATES that don't match the "yyyy-mm-dd" format. I want all the dates the follow the specified format. Is this something that is possible using Pandas?

mully
  • 139
  • 6
  • Rows 1,2 and 3 have a issue of being both compatible in dd-mm-yyyy and yy-dd-mm. If you could select the rows and specify the formats, yes. – Celius Stingher Aug 08 '22 at 12:51
  • 2
    Usually, `pd.to_datetime()` is smart enough to infer the format on its own. So try the following: `df["date"] = pd.to_datetime(df["date"])` If this should fail, see [this post](https://stackoverflow.com/questions/47256212/handling-multiple-datetime-formats-with-pd-to-datetime) to with a similar question which was likely solved. You can then convert is back to a string with your desired format: `df["date"].dt.strftime('%Y-%m-%d')` – psalt Aug 08 '22 at 12:52
  • @psalt Thank you for this, this worked exactly how I needed it to. – mully Aug 08 '22 at 12:58
  • `08-25-21` - does it mean `August 25, 2021`? – Vitalizzare Aug 08 '22 at 13:01
  • @mully I'm glad this was the solution for you. I posted my comment as an answer to your post so the solution is more visible in case others have a similar questions. If you like, you can mark that answer as the accepted answer. – psalt Aug 08 '22 at 14:32

3 Answers3

1

After importation and assuming they are imported as strings:

import numpy as np
df['date'] = np.where(len(df['date']) == 8, '20' + df['date'], df['date'])
gtomer
  • 5,643
  • 1
  • 10
  • 21
1

Following gtomer's suggestion and assuming these dates are wrongly formated and the format is the correct one you can also try with .str.pad:

s.str.pad(width=10, side='right', fillchar='20')
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

Usually, pd.to_datetime() is smart enough to infer the format on its own. To convert a series or a column of the dataframe to the datetime format you can use:

df["date"] = pd.to_datetime(df["date"]) 

You can then convert the series back to a string with the desired format:

df["date"].dt.strftime('%Y-%m-%d')

When working with (multiple) unusual formats you might need to use a different method, see this similar question.

psalt
  • 384
  • 2
  • 9