You can convert the numpy.datetime64
date value to datetime
using pd.to_datetime()
and then you can extract year or month or day from it.
import numpy as np
date = np.datetime64('2014-01-01')
type(date)
Output:
numpy.datetime64
Convert this date to pandas datetime using pd.to_datetime
.
date = pd.to_datetime(date)
type(date)
Output:
pandas._libs.tslibs.timestamps.Timestamp
Then you can extract the year using .year
date.year
Output:
2014
So, if you if you have a df:
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df
Output:
date
0 2014
1 2015
2 2016
Alternately, you can also do this
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = df['date'].apply(lambda x: x.strftime('%Y'))
df
Output:
date
0 2014
1 2015
2 2016
EDIT 1
Group by using year when the column has date values
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.groupby(df.index.year).size()
Output:
date
2014 1
2015 1
2016 1
You can still do the same even if you have removed the month and day from the date and only have year in your column
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df.groupby('date').size()
Output:
date
2014 1
2015 1
2016 1