0

I am trying to understand how I can edit the dataframe in python using pandas so I can drop everything but the year.

Example: if the date is 2014-01-01, I want it to show 2014 and drop both the month and the date. All the dates are in a single column.

Thanks in advice!

  • 2
    Could you post an example data frame, and what you have tried so far? – PacketLoss Mar 16 '20 at 01:10
  • What format is your data in? Are they datetime objects or strings? – Muon Mar 16 '20 at 01:22
  • 1
    The date is stored in (datetime64) format.. so far I have tried combining by the year (city_data['date_value'].combine(2014)) – Chris Loftus Mar 16 '20 at 01:35
  • I think you are not the first that run into this issue, please take a look at: https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column – Hennep Mar 16 '20 at 01:46
  • What is the issue, exactly? Please see [ask], [help/on-topic]. – AMC Mar 16 '20 at 03:14

1 Answers1

0

You can convert the numpy.datetime64 date value to datetime using pd.to_datetime() and then you can extract year or month or day from it.

import numpy as np
date = np.datetime64('2014-01-01')
type(date)

Output:

numpy.datetime64

Convert this date to pandas datetime using pd.to_datetime.

date = pd.to_datetime(date)
type(date)

Output:

pandas._libs.tslibs.timestamps.Timestamp

Then you can extract the year using .year date.year

Output:

2014

So, if you if you have a df:

df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df

Output:

   date
0  2014
1  2015
2  2016

Alternately, you can also do this

df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = df['date'].apply(lambda x: x.strftime('%Y'))
df

Output:

   date
0  2014
1  2015
2  2016

EDIT 1

Group by using year when the column has date values

df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.groupby(df.index.year).size()

Output:

date
2014    1
2015    1
2016    1

You can still do the same even if you have removed the month and day from the date and only have year in your column

df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df.groupby('date').size()

Output:

date
2014    1
2015    1
2016    1
davidbilla
  • 2,120
  • 1
  • 15
  • 26
  • This is awesome! The only problem is that not all the dates are 2014-01-01. There are thousands of event that occur on any given day through the different years. There is not a way to group it by the years if I drop all the months and days? – Chris Loftus Mar 22 '20 at 17:55
  • @ChrisLoftus You can still use `df.groupby` on a column that has only date values or on a column that has only year values. See EDIT 1 above. Let me know if this is what you are expecting. – davidbilla Mar 23 '20 at 05:13