I have a dataframe df
as follows where a Date
field is present. I am reading this from excel using pd.read_excel()
method.
Company Code Trxn_date Company Name Type
20040 2019-05-11 00:00:00 ABC Series A #<--the date is in `datetime` object only.
20034 2019-04-26 00:00:00 XYZ Series A
20033 "5/15/2018\n23/4/2019" PQR "Series A
Series B" # <-- In same row.
20040 2019-06-05 00:00:00 ABC Series B
20056 8/16/2019 MNO Series B
As you can see that for 20033
there are two entries in Trxn_date
apart by \n
. Same is true for Type
field. Because of this, if I apply pd.to_datetime(df['Trxn_date'])
I get an obvious error TypeError: invalid string coercion to datetime
. I do not want to coerce
option.
Please note that except 20033
and 20056
all dates are automatically converted to datetime
object by pandas.
I would like to get the df
as follows.
Company Code Trxn_date Company Name Type
20040 2019-05-11 ABC Series A
20034 2019-04-26 XYZ Series A
20033 2019-04-23 PQR Series B #<--Only the last date string is picked up and converted to datetime.
20040 2019-06-05 ABC Series B
20056 2019-08-16 MNO Series B #<--The date format is changed to `yyyy-mm-dd`.
I am not able to get any clue to achieve the same above. For 20056
I can use pd.to_datetime(df['Trxn_date'],errors='coerce').apply(lambda x : x.strftime('%Y-%m-%d') if pd.notnull(x) else ' ')
. This operation creates a blank in Trxn_date
field for 20033
.
Can anybody give any insight on this? May be I have to write a function and then use lambda
for the same?