1

I have data that looks like this.

VendorID    lpep_pickup_datetime    lpep_dropoff_datetime   store_and_fwd_flag
2   1/1/2018 0:18:50    1/1/2018 12:24:39 AM    N
2   1/1/2018 0:30:26    1/1/2018 12:46:42 AM    N
2   1/1/2018 0:07:25    1/1/2018 12:19:45 AM    N
2   1/1/2018 0:32:40    1/1/2018 12:33:41 AM    N
2   1/1/2018 0:32:40    1/1/2018 12:33:41 AM    N
2   1/1/2018 0:38:35    1/1/2018 1:08:50 AM N
2   1/1/2018 0:18:41    1/1/2018 12:28:22 AM    N
2   1/1/2018 0:38:02    1/1/2018 12:55:02 AM    N
2   1/1/2018 0:05:02    1/1/2018 12:18:35 AM    N
2   1/1/2018 0:35:23    1/1/2018 12:42:07 AM    N

So, I converted df.lpep_pickup_datetime to datetime, but originally it comes in as a string. I'm not sure which one is easier to work with. I want to append 5 fields onto my current dataframe: year, month, day, weekday, and hour.

I tried this:

df['Year']=[d.split('-')[0] for d in df.lpep_pickup_datetime]
df['Month']=[d.split('-')[1] for d in df.lpep_pickup_datetime]
df['Day']=[d.split('-')[2] for d in df.lpep_pickup_datetime]

That gives me this error: AttributeError: 'Timestamp' object has no attribute 'split'

I tried this:

df2 = pd.DataFrame(df.lpep_pickup_datetime.dt.strftime('%m-%d-%Y-%H').str.split('/').tolist(),
                   columns=['Month', 'Day', 'Year', 'Hour'],dtype=int)

df = pd.concat((df,df2),axis=1)

That gives me this error: AssertionError: 4 columns passed, passed data had 1 columns

Basically, I want to parse df.lpep_pickup_datetime into year, month, day, weekday, and hour, appending each to the same dataframe. How can I do that?

Thanks!!

martineau
  • 119,623
  • 25
  • 170
  • 301
ASH
  • 20,759
  • 19
  • 87
  • 200
  • Does this answer your question? [Extracting just Month and Year separately from Pandas Datetime column](https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column) – AMC Feb 08 '20 at 01:21

2 Answers2

2

Here you go, first I'm creating a random dataset and then renaming the column date to the name you want, so you can just copy the code. Pandas has a big section of time-series series manipulation, you don't actually need to import datetime. Here you can find a lot more information about it:

import pandas as pd
date_rng = pd.date_range(start='1/1/2018', end='4/01/2018', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['lpep_pickup_datetime'] = df['date']
df['year'] = df['lpep_pickup_datetime'].dt.year
df['year'] = df['lpep_pickup_datetime'].dt.month
df['weekday'] = df['lpep_pickup_datetime'].dt.weekday
df['day'] = df['lpep_pickup_datetime'].dt.day
df['hour'] = df['lpep_pickup_datetime'].dt.hour
print(df)

Output:

                    date lpep_pickup_datetime  year  weekday  day  hour
0    2018-01-01 00:00:00  2018-01-01 00:00:00     1        0    1     0
1    2018-01-01 01:00:00  2018-01-01 01:00:00     1        0    1     1
2    2018-01-01 02:00:00  2018-01-01 02:00:00     1        0    1     2
3    2018-01-01 03:00:00  2018-01-01 03:00:00     1        0    1     3
4    2018-01-01 04:00:00  2018-01-01 04:00:00     1        0    1     4
...                  ...                  ...   ...      ...  ...   ...
2156 2018-03-31 20:00:00  2018-03-31 20:00:00     3        5   31    20
2157 2018-03-31 21:00:00  2018-03-31 21:00:00     3        5   31    21
2158 2018-03-31 22:00:00  2018-03-31 22:00:00     3        5   31    22
2159 2018-03-31 23:00:00  2018-03-31 23:00:00     3        5   31    23
2160 2018-04-01 00:00:00  2018-04-01 00:00:00     4        6    1     0

EDIT: Since this is not working (As stated in the comments in this answer), I believe your data is formated incorrectly. Try this before applying anything:

df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime'], format='%d/%m/%y %H:%M:%S')

If this format is recognized properly, then you should have no trouble using dt.year,dt.month,dt.hour,dt.day,dt.weekday.

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • I don't have 'date_range' so I simply tried this: df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime']) df['year'] = df['lpep_pickup_datetime'].dt.year It seems like that should work, but now all my results are 0 and 1. That's not even close. What could be wrong here? – ASH Sep 24 '19 at 17:04
  • I think the format in hwihc your column is being interpreted is incorrect, that's why asking when applying `dt.year` you are getting an ouptut that is way off with what you expected. Although it seems strange, but I believe that checking dtypes will help us understand if something is wrong. – Celius Stingher Sep 24 '19 at 17:21
  • This is it: lpep_pickup_datetime datetime64[ns] – ASH Sep 24 '19 at 18:08
  • 1
    Nice! It's working now. I guess the 'pd.to_datetime' does all the magic. Thanks for the help. Love it. – ASH Sep 24 '19 at 18:34
  • Yes, I believe pd.to_datetime wasn't interpreting the format we wanted to, that's why I added it. Glad to help! – Celius Stingher Sep 24 '19 at 18:35
1

Give this a go. Since your dates are in the datetime dtype already, just use the datetime properties to extract each part.

import pandas as pd
from datetime import datetime as dt

# Creating a fake dataset of dates.
dates = [dt.now().strftime('%d/%m/%Y %H:%M:%S') for i in range(10)]
df = pd.DataFrame({'lpep_pickup_datetime': dates})
df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime'])

# Parse each date into its parts and store as a new column.
df['month'] = df['lpep_pickup_datetime'].dt.month
df['day'] = df['lpep_pickup_datetime'].dt.day
df['year'] = df['lpep_pickup_datetime'].dt.year
# ... and so on ...

Output:

  lpep_pickup_datetime  month  day  year
0  2019-09-24 16:46:10      9   24  2019
1  2019-09-24 16:46:10      9   24  2019
2  2019-09-24 16:46:10      9   24  2019
3  2019-09-24 16:46:10      9   24  2019
4  2019-09-24 16:46:10      9   24  2019
5  2019-09-24 16:46:10      9   24  2019
6  2019-09-24 16:46:10      9   24  2019
7  2019-09-24 16:46:10      9   24  2019
8  2019-09-24 16:46:10      9   24  2019
9  2019-09-24 16:46:10      9   24  2019
S3DEV
  • 8,768
  • 3
  • 31
  • 42
  • I don't have a 'dates' field like you show in your example, so I simply did this: df['lpep_pickup_datetime'] = pd.DataFrame({'lpep_pickup_datetime': df}) Now, I'm getting this error. ValueError: If using all scalar values, you must pass an index – ASH Sep 24 '19 at 17:05
  • 1
    @asher Sorry if I wasn't clear. The first block is just me creating a dataframe to use in the example. Just use the block which says, "# Parse each date into ...". This is where the dates are parsed into each part (day, month, year, etc) and put into new columns. – S3DEV Sep 24 '19 at 18:43
  • Got it. Thanks!! – ASH Sep 24 '19 at 20:07