-2

Kindly assist me in cleaning my date types in python.

My sample data is as follows:

INITIATION DATE DATE CUT DATE GIVEN
1/July/2022 21 July 2022 11-July-2022
17-July-2022 16/July/2022 21/July/2022
16-July-2022 01-July-2022 09/July/2022
19-July-2022 31 July 2022 27 July 2022

How do I remove all dashes/slashes/hyphens from dates in the different columns? I have 8 columns and 300 rows.

What i tried:

df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']]= df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']].apply(pd.to_datetime, format = '%d%b%Y')

Desired output format for all: 1 July 2022

ValueError I'm getting:

time data '18 July 2022' does not match format '%d-%b-%Y' (match)

James Z
  • 12,209
  • 10
  • 24
  • 44
Toni
  • 1
  • 2
  • 2
    what code are you running that is giving you that error? – scotscotmcc Dec 01 '22 at 20:47
  • Welcome to SO! Please take a look through the guide to creating a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Alexander L. Hayes Dec 01 '22 at 20:49
  • "18 July 2022' does not match format '%d-%b-%Y": why not use a different parsing format string then? One without the dashes, '%d %b %Y' may do a better job for the second column. (Obviously, the third column needs yet another format.) – 9769953 Dec 01 '22 at 20:51
  • Hi @Toni, in order to help you, we need to get more context. Where does the data come from, a csv file, a spread sheet, ...? What have you tried? Can you use libraries such as pandas or are you limited to the standard library because you are not allowed to install anything into your python environment?... – Carlos Horn Dec 01 '22 at 20:53
  • Hi@ Carlos Horn. Data source is an excel file. You can install anything. I am using pandas but cannot really navigate. I'm a newbie – Toni Dec 01 '22 at 21:00
  • @9769953 I cannot get a single parsing method to format all columns at once instead of formatting each? – Toni Dec 01 '22 at 21:02
  • Does this answer your question? [Convert Pandas Column to DateTime](https://stackoverflow.com/questions/26763344/convert-pandas-column-to-datetime) – Pranav Hosangadi Dec 01 '22 at 23:44

2 Answers2

1

to remove all dashes/slashes/hyphens from strings you can just use replace method:

df.apply(lambda x: x.str.replace('[/-]',' ',regex=True))

>>>
'''
  INITIATION DATE      DATE CUT    DATE GIVEN
0     1 July 2022  21 July 2022  11 July 2022
1    17 July 2022  16 July 2022  21 July 2022
2    16 July 2022  01 July 2022  09 July 2022
3    19 July 2022  31 July 2022  27 July 2022

and if you also need to conver strings to datetime then try this:

df.apply(lambda x: pd.to_datetime(x.str.replace('[/-]',' ',regex=True)))

>>>
'''
  INITIATION DATE   DATE CUT DATE GIVEN
0      2022-07-01 2022-07-21 2022-07-11
1      2022-07-17 2022-07-16 2022-07-21
2      2022-07-16 2022-07-01 2022-07-09
3      2022-07-19 2022-07-31 2022-07-27
SergFSM
  • 1,419
  • 1
  • 4
  • 7
0

You can use pd.to_datetime to convert strings to datetime objects. The function takes a format argument which specifies the format of the datetime string, using the usual format codes

df['INITIATION DATE'] = pd.to_datetime(df['INITIATION DATE'], format='%d-%B-%Y').dt.strftime('%d %B %Y')
df['DATE CUT'] = pd.to_datetime(df['DATE CUT'], format='%d %B %Y').dt.strftime('%d %B %Y')
df['DATE GIVEN'] = pd.to_datetime(df['DATE GIVEN'], format='%d/%B/%Y').dt.strftime('%d %B %Y')

output

INITIATION DATE      DATE CUT    DATE GIVEN
0    01 July 2022  21 July 2022  11 July 2022
1    17 July 2022  16 July 2022  21 July 2022
2    16 July 2022  01 July 2022  09 July 2022
3    19 July 2022  31 July 2022  27 July 2022

You get that error because your datetime strings (e.g. '18 July 2022') do not match your format specifiers ('%d-%b-%Y') because of the extra hyphens in the format specifier.

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
Hannon qaoud
  • 785
  • 2
  • 21
  • Hi @Hannon qaoud. Within each column the dates format are mixed up. Therefore i end up with same error. Initiation column for instance has 01 July 2022, 05/July/2022, 07-July-2022, – Toni Dec 02 '22 at 04:02
  • I understand now, maybe provide more complex test cases to cover all possibilities, also I do recommend @SergFSM's solution. – Hannon qaoud Dec 02 '22 at 14:28