-1

I have the following table of dates.

How can I transform those string dates to just show the numeric year?

I require this to be done in Python for a Pandas dataframe.

I know one option is to slice the strings, but is there something more efficient?

datum
Fri Aug 07, 2020 05:12 UTC
Thu Aug 06, 2020 04:01 UTC
Tue Aug 04, 2020 23:57 UTC
Thu Jul 30, 2020 21:25 UTC
Thu Jul 30, 2020 11:50 UTC
Sat Jul 25, 2020 03:13 UTC
Thu Jul 23, 2020 14:26 UTC
Thu Jul 23, 2020 04:41 UTC
Mon Jul 20, 2020 21:30 UTC
Sun Jul 19, 2020 21:58 UTC
Wed Jul 15, 2020 13:46 UTC
Fri Jul 10, 2020 04:17 UTC
Thu Jul 09, 2020 12:11 UTC
Mon Jul 06, 2020 01:00 UTC
Sat Jul 04, 2020 23:44 UTC
Sat Jul 04, 2020 21:19 UTC
Fri Jul 03, 2020 03:10 UTC
Tue Jun 30, 2020 20:10 UTC
Tue Jun 23, 2020 01:43 UTC
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
johnadem
  • 153
  • 2
  • 12

2 Answers2

4
  • The question is a duplicate, but this answer has been left to clarify a solution.
  • Use the .dt accessor to extract the year, once the column is converted to a datetime64[ns] dtype.
  • The sample data is all 'UTC', so pd.to_datetime(df.datum, utc=True) should work without using format.
    • It should be noted that converting a column without specifying format takes longer because the date format must be inferred.
    • If there are varying time zones, then try specifying the format.
    • Select the correct format codes to set the format in pandas.to_datetime().
import pandas as pd

# test data
data = {'datum': ['Fri Aug 07, 2020 05:12 UTC', 'Thu Aug 06, 2020 04:01 UTC', 'Tue Aug 04, 2020 23:57 UTC', 'Thu Jul 30, 2020 21:25 UTC', 'Thu Jul 30, 2020 11:50 UTC', 'Sat Jul 25, 2020 03:13 UTC', 'Thu Jul 23, 2020 14:26 UTC', 'Thu Jul 23, 2020 04:41 UTC', 'Mon Jul 20, 2020 21:30 UTC', 'Sun Jul 19, 2020 21:58 UTC', 'Wed Jul 15, 2020 13:46 UTC', 'Fri Jul 10, 2020 04:17 UTC', 'Thu Jul 09, 2020 12:11 UTC', 'Mon Jul 06, 2020 01:00 UTC', 'Sat Jul 04, 2020 23:44 UTC', 'Sat Jul 04, 2020 21:19 UTC', 'Fri Jul 03, 2020 03:10 UTC', 'Tue Jun 30, 2020 20:10 UTC', 'Tue Jun 23, 2020 01:43 UTC']}
df = pd.DataFrame(data)

# if you have varying time zones, try specifying the format
df.datum = pd.to_datetime(df.datum, format='%a %b %d, %Y %H:%M %Z')

# if you have only utc
df.datum = pd.to_datetime(df.datum, utc=True)
df['Year'] = df.datum.dt.year

                    datum  Year
2020-08-07 05:12:00+00:00  2020
2020-08-06 04:01:00+00:00  2020
2020-08-04 23:57:00+00:00  2020
2020-07-30 21:25:00+00:00  2020
2020-07-30 11:50:00+00:00  2020
2020-07-25 03:13:00+00:00  2020
2020-07-23 14:26:00+00:00  2020
2020-07-23 04:41:00+00:00  2020
2020-07-20 21:30:00+00:00  2020
2020-07-19 21:58:00+00:00  2020
2020-07-15 13:46:00+00:00  2020
2020-07-10 04:17:00+00:00  2020
2020-07-09 12:11:00+00:00  2020
2020-07-06 01:00:00+00:00  2020
2020-07-04 23:44:00+00:00  2020
2020-07-04 21:19:00+00:00  2020
2020-07-03 03:10:00+00:00  2020
2020-06-30 20:10:00+00:00  2020
2020-06-23 01:43:00+00:00  2020
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
1

Suppose this column of data is in a DataFrame called df, then,

df['datum'] = pandas.to_datetime(df['datum'], format='%a %b %d, %Y %H:%M UTC')

converts the string dates into datetime objects.

Now, if you just want the year, you can take any object from the table and access its year property. For example, df['datum'][0].year.

If you want to make a column of just the years:

df['year'] = df['datum'].dt.year

sudeep
  • 404
  • 2
  • 8
  • This only extracts the year only from the first row in the datum column and will not work to extract the year for all rows. If there are different years in the column, this answer will not extract them. – Trenton McKinney Jan 30 '21 at 18:18
  • Not clear in the question what the final output should be. The answer provides a way to get the year for the first row as an example. – sudeep Jan 30 '21 at 18:20
  • _How can I transform those string dates..._ dates is plural, which implies all the dates not one of the dates. – Trenton McKinney Jan 30 '21 at 18:22
  • Thanks! Solution worked – johnadem Jan 30 '21 at 22:40