I am trying to build a machine learning model using an excel spreadsheet that cannot be edited. The a few of the columns in the .xls have formatting issues so some of the data is displayed as a datetime stamp instead of an str or int. Here is an example from the pd dataframe:
0 40-49 premeno 15-19 0-2 yes 3
1 50-59 ge40 15-19 0-2 no 1
2 50-59 ge40 35-39 0-2 no 2
3 40-49 premeno 35-39 0-2 yes 3
4 40-49 premeno 30-34 **2019-05-03 00:00:00** yes 2
In line 4, the value of 3-5 has been accidentally formatted as a date (shown as 03 May in the xls) and so is assigned as a datetime stamp in the dataframe. I have tried many methods to replace 2019-05-03 00:00:00
with 3-5
including:
df['column'] = df['column'].replace([('2019-05-03 00:00:00')], '3-5')
and using Timestamp.replace but neither seem to work. Any ideas of how to replace this mis formatted data points with the correct data?