3

As an example, I'm trying to slice a field, 'date', containing dates in the format YYYYMMDD into 3 individual fields ('year', 'month', 'day').

I have an approach that assigns each value one at a time, but I assume there is a more efficient way to produce the desired result.

Current solution:

df['year'] = df['date'].astype(str).apply(lambda x: x[:4])
df['month'] = df['date'].astype(str).apply(lambda x: x[4:6])
df['day'] = df['date'].astype(str).apply(lambda x: x[6:8])

The following is an example of one of my attempts to simplify the code:

df['year'], df['month'], df['day'] = df['date'].astype(str).apply(lambda x: [x[:4], x[4:6], x[6:8]])
jpp
  • 159,742
  • 34
  • 281
  • 339
Billy Bob
  • 35
  • 4
  • 1
    Thanks everybody! All solutions work well. I selected MaxU's answer since it can be applied to any fixed width string. – Billy Bob Jun 11 '18 at 20:57

5 Answers5

2

You should always convert date columns to pandas datetime object. You can do this in one of 2 ways: 1. When you impost the date:

 df = pd.read_csv('your-raw_date.csv', parse_dates=[5,7])

NOTE: 5 and 7 are the column numbers where the dates are.

  1. You can use a method called to_datetime to individual column. see this post for details.

Second, import a module called datetime

import datetime as dt

once you have those 2 things, you can do things like:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

you can event do other cool things like

dt['signup_date_dayofweek'] = dt['date'].dt.dayofweek

see full documentation of datatime module

zafrin
  • 434
  • 4
  • 11
1

convert the string to datetime and use datetime-type columns' .dt. attributes

df.date = pd.to_datetime(df.date, format='%Y%m%d')
df['year'], df['month'], df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day

after you convert the date from date to datetime, you might find that extracting the year, month & day are not even necessary as the attributes are readily available through the date column's dt attributes.

Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
1

Since it looks like they are already integers, forget about all of the string formatting.

df['year'] = df.date//10**4
df['month'] = df.date//100%100
df['day'] = df.date%100
ALollz
  • 57,915
  • 7
  • 66
  • 89
1

assuming column date is of a string (object) dtype:

In [18]: df
Out[18]:
       date
0  20180131
1  20180611
2  20180513

In [19]: df.dtypes
Out[19]:
date    object
dtype: object

solution:

In [22]: df[['year','month','day']] = df.date.str.extract(r'(\d{4})(\d{2})(\d{2})').astype(int)

In [23]: df
Out[23]:
       date  year  month  day
0  20180131  2018      1   31
1  20180611  2018      6   11
2  20180513  2018      5   13

In [24]: df.dtypes
Out[24]:
date     object
year      int32
month     int32
day       int32
dtype: object

PS if date is of a numeric dtype, then I'd choose @ALollz's solution...

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

Here's a functional solution using operator.attrgetter:

from operator import attrgetter

df = pd.DataFrame({'Date': [20180131, 20180611, 20180513]})

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d')

parts = ('year', 'month', 'day')

df['Year'], df['Month'], df['Day'] = zip(*df['Date'].apply(attrgetter(*parts)).values)

print(df)

        Date  Year  Month  Day
0 2018-01-31  2018      1   31
1 2018-06-11  2018      6   11
2 2018-05-13  2018      5   13
jpp
  • 159,742
  • 34
  • 281
  • 339