0

I have a date column, which is formatted as object in the following format:

May 12, 2021
Apr 1, 2019

I would need to change the data type to date. I found a way, which carves out DD MMM YYYY into separated columns and then creates final date column afterwards. However this is obviously not the most elegant way. Could you advise me better solution, please?

data['month'] = data['release_date'].str.split(' ').str[0]
data['year'] = data['release_date'].str.split(', ').str[1]
data['day'] = data['release_date'].str.split(' ').str[1].replace(',', '')

data['date'] = pd.to_datetime(data[['year', 'month', 'day']])
Srpic
  • 450
  • 5
  • 13
  • 29
  • `df["release_date"].apply(pd.to_datetime)`..... – ddejohn Mar 20 '22 at 06:16
  • `pd.to_datetime(df["release_date"])` – furas Mar 20 '22 at 06:18
  • It does not as both ways return this error message - Can only use .str accessor with string values! – Srpic Mar 20 '22 at 06:18
  • That makes no sense. If your `release_date` column are strings, you should be able to call `pd.to_datetime` on the column. Please show the exact code that produces this error, and a sample of your dataframe. – ddejohn Mar 20 '22 at 06:19
  • 1
    You shouldn't split to columns but use directly `to_datetime` with pattern like `%B %d, %Y` (if I'm not wrong) – furas Mar 20 '22 at 06:19
  • 1
    when I test your data with `pd.to_datetime(df["release_date"])` then I get result without error – furas Mar 20 '22 at 06:22
  • 1
    I can't reproduce your problem. Maybe you should show example data directly in code as `DataFrame`. – furas Mar 20 '22 at 06:28
  • 1
    and if you get error message with `to_datetime` then you should show it in your question because we can't run your code with your real data, and we can't read in your mind. – furas Mar 20 '22 at 06:31

1 Answers1

0

I have no idea what is problem because your code gives me error

ValueError: Unable to parse string "May" at position 0

but this works for me without error

import pandas as pd

data = pd.DataFrame({
    'release_date': ['May 12, 2021', 'Apr 1, 2019']
})

data['date'] = pd.to_datetime(data['release_date'])

# or 
data['date'] = pd.to_datetime(data['release_date'], format='%b %d, %Y')

print(data)

Result

   release_date       date
0  May 12, 2021 2021-05-12
1   Apr 1, 2019 2019-04-01

Maybe all this depends on natives settings in system - different countries can use different dates (i.e Y-M-D, D-M-Y, M D, Y) and later it can make differences in code. And this would need more details about problem.

furas
  • 134,197
  • 12
  • 106
  • 148