- The question is a duplicate, but this answer has been left to clarify a solution.
- Use the
.dt
accessor to extract the year, once the column is converted to a datetime64[ns] dtype
.
- The sample data is all
'UTC'
, so pd.to_datetime(df.datum, utc=True)
should work without using format
.
- It should be noted that converting a column without specifying
format
takes longer because the date format must be inferred.
- If there are varying time zones, then try specifying the format.
- Select the correct format codes to set the
format
in pandas.to_datetime()
.
import pandas as pd
# test data
data = {'datum': ['Fri Aug 07, 2020 05:12 UTC', 'Thu Aug 06, 2020 04:01 UTC', 'Tue Aug 04, 2020 23:57 UTC', 'Thu Jul 30, 2020 21:25 UTC', 'Thu Jul 30, 2020 11:50 UTC', 'Sat Jul 25, 2020 03:13 UTC', 'Thu Jul 23, 2020 14:26 UTC', 'Thu Jul 23, 2020 04:41 UTC', 'Mon Jul 20, 2020 21:30 UTC', 'Sun Jul 19, 2020 21:58 UTC', 'Wed Jul 15, 2020 13:46 UTC', 'Fri Jul 10, 2020 04:17 UTC', 'Thu Jul 09, 2020 12:11 UTC', 'Mon Jul 06, 2020 01:00 UTC', 'Sat Jul 04, 2020 23:44 UTC', 'Sat Jul 04, 2020 21:19 UTC', 'Fri Jul 03, 2020 03:10 UTC', 'Tue Jun 30, 2020 20:10 UTC', 'Tue Jun 23, 2020 01:43 UTC']}
df = pd.DataFrame(data)
# if you have varying time zones, try specifying the format
df.datum = pd.to_datetime(df.datum, format='%a %b %d, %Y %H:%M %Z')
# if you have only utc
df.datum = pd.to_datetime(df.datum, utc=True)
df['Year'] = df.datum.dt.year
datum Year
2020-08-07 05:12:00+00:00 2020
2020-08-06 04:01:00+00:00 2020
2020-08-04 23:57:00+00:00 2020
2020-07-30 21:25:00+00:00 2020
2020-07-30 11:50:00+00:00 2020
2020-07-25 03:13:00+00:00 2020
2020-07-23 14:26:00+00:00 2020
2020-07-23 04:41:00+00:00 2020
2020-07-20 21:30:00+00:00 2020
2020-07-19 21:58:00+00:00 2020
2020-07-15 13:46:00+00:00 2020
2020-07-10 04:17:00+00:00 2020
2020-07-09 12:11:00+00:00 2020
2020-07-06 01:00:00+00:00 2020
2020-07-04 23:44:00+00:00 2020
2020-07-04 21:19:00+00:00 2020
2020-07-03 03:10:00+00:00 2020
2020-06-30 20:10:00+00:00 2020
2020-06-23 01:43:00+00:00 2020