-1

I have a time column that is in string format("10:27:30 PM") and a column that shows the day of the month as type int. I want to clean my data for my machine learning model. I changed the time column into a date-time data type by using df['Time'] = df['Time'].astype('datetime64'). The returned column has values that have today's date and the time in 24hr format (2020-08-28 10:27:30). I also changed the 'Day of the month' column using

df[['Pickup - Day of Month']] = pd.to_datetime(df['Pickup - Day of Month'], format="%d")

and it changed to '1900-01-31', 31 is the day of the month. I also tried splitting the day, hour, minutes, seconds into different columns and the return type are all type int columns. How can I clean data like this in pandas for my machine learning models? any suggestions?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    Welcome to SO.. instead of just dumping some text you can show some sample data, code & expected output would have a better change of getting attention [`see here`](https://stackoverflow.com/q/20109391/4985099). – sushanth Aug 28 '20 at 07:06

1 Answers1

0

Take a look at the origin parameter of pd.to_datetime. You can specify any date you want as the first date instead of 1900-01-01.

And then add your time to this date column with pd.to_timedelta

df['DateTimeColumn'] = pd.to_datetime(df['Pickup - Day of Month'], origin=pd.Timestamp('2020-08-01') \
    + pd.to_timedelta(df.Time)
RichieV
  • 5,103
  • 2
  • 11
  • 24