1

I have a dataframe with two columns, one of which is dates formatted like 2021-05-01 and I would like to remove the day and month and only have the year. I tried:

df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y')

But apparently at least one of the rows has "00" for the month and/or day so this returned an error. I tried the solution here but it returned the following error:

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I'm very much a beginner and not sure what to do here. Thank you!

morgan
  • 23
  • 3

2 Answers2

2

If possible extract year by format YYYY use Series.str.extract:

df['year'] = df['date'].str.extract('(\d{4})', expand=False)

Or filter first 4 digits:

df['year'] = df['date'].str[:4]

Or remove last 6 digits:

df['year'] = df['date'].str[-6:]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Try using dt.year

df.date = pd.to_datetime(df.date).dt.year
Raymond Toh
  • 779
  • 1
  • 8
  • 27
  • `But apparently at least one of the rows has "00" for the month and/or day so this returned an error.` - So `pd.to_datetime(df.date)` failed. – jezrael Sep 21 '21 at 06:38
  • 1
    @jezrael That is true. I would advise the row to be dropped as the day/month should not be 00. The year information should also be dirty data. – Raymond Toh Sep 21 '21 at 06:44
  • @RaymondToh if it was real data I would've dropped it, but it's for homework so I'm supposed to have all the rows :) – morgan Sep 27 '21 at 12:33