79

Assume I have a DataFrame sales of timestamp values:

timestamp               sales_office
2014-01-01 09:01:00     Cincinnati
2014-01-01 09:11:00     San Francisco
2014-01-01 15:22:00     Chicago
2014-01-01 19:01:00     Chicago

I would like to create a new column time_hour. I can create it by writing a short function as so and using apply() to apply it iteratively:

def hr_func(ts):
    return ts.hour

sales['time_hour'] = sales['timestamp'].apply(hr_func)

I would then see this result:

timestamp               sales_office         time_hour
2014-01-01 09:01:00     Cincinnati           9
2014-01-01 09:11:00     San Francisco        9
2014-01-01 15:22:00     Chicago              15
2014-01-01 19:01:00     Chicago              19

What I'd like to achieve is some shorter transformation like this (which I know is erroneous but gets at the spirit):

sales['time_hour'] = sales['timestamp'].hour

Obviously the column is of type Series and as such doesn't have those attributes, but it seems there's a simpler way to make use of matrix operations.

Is there a more-direct approach?

Daniel Black
  • 968
  • 1
  • 7
  • 11
  • 13
    ``pd.Datetimeindex(sales['timestamp']).hour`` will be MUCH faster than using ``.apply`` – Jeff Aug 05 '14 at 10:21
  • This is the way I'll go. I was looking for a way to convert those columns to a datetimeindex-like object using `pd.to_datetime` iteratively. But the entire column itself needs to be a datetimeindex object, which isn't achieved with `pd.to_datetime`. – Daniel Black Aug 06 '14 at 16:04
  • you can do this with ``pd.to_datetime(column.values,box=True)`` as well (as somepoint I think will add a ``Series.to_index()`` method to basically do this directly. This is all vectorized. – Jeff Aug 06 '14 at 16:07
  • @JohnE not sure where you are talking about – Jeff Mar 26 '15 at 22:13
  • 10
    @Jeff -- ```Datetimeindex``` should be ```DatetimeIndex```, right? (capital I in Index) – JohnE Mar 26 '15 at 23:05
  • you are right - can't edit the comments though :( – Jeff Mar 26 '15 at 23:22
  • What if the column is not an index? – Nickpick Nov 18 '16 at 12:31
  • @nickpick I assume bunhae's answer would apply to index and non-index columns. – Daniel Black Dec 05 '16 at 02:27

8 Answers8

72

Assuming timestamp is the index of the data frame, you can just do the following:

hours = sales.index.hour

If you want to add that to your sales data frame, just do:

import pandas as pd
pd.concat([sales, pd.DataFrame(hours, index=sales.index)], axis = 1)

Edit: If you have several columns of datetime objects, it's the same process. If you have a column ['date'] in your data frame, and assuming that 'date' has datetime values, you can access the hour from the 'date' as:

hours = sales['date'].hour

Edit2: If you want to adjust a column in your data frame you have to include dt:

sales['datehour'] = sales['date'].dt.hour

Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46
Sudipta Basak
  • 3,089
  • 2
  • 18
  • 14
  • I unfortunately set my example up poorly. My actual conundrum includes several columns of `datetime` values. I'll be calculating elapsed _business hours_ between the timestamps, so will be extracting several sets of hour-unit values. – Daniel Black Aug 05 '14 at 00:48
  • 26
    Assuming 'date' is a column `hours = sales['date'].hour1` will give an attribute error: `AttributeError: 'Series' object has no attribute 'hour'`
    – Lucas Mar 12 '17 at 19:30
  • 21
    @Lucas right, if it's a column then the answer would be hours=sales['date'].dt.hour – famargar May 16 '17 at 10:55
  • 1
    @famargar thank you so much adding dt save my life. What's the meaning of dt btw? – overloading Dec 08 '17 at 05:47
  • 1
    @overloading probably the abbreviation of `datetime` – Mathias711 Mar 29 '18 at 12:50
  • 1
    I increasingly found that `pandas` just has too many layers to peel. It's not a straightforward search to know that you need to add `dt` to access a `datetime` column; and if it a column datetime string values, `dt` won't work either. – stucash Jul 16 '21 at 09:10
58

For posterity: as of 0.15.0, there is a handy .dt accessor you can use to pull such values from a datetime/period series (in the above case, just sales.timestamp.dt.hour!

iff_or
  • 880
  • 1
  • 11
  • 24
  • The link provided didn't work for me. This one did [.dt accessor](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors). – Niklas Dec 12 '16 at 17:24
20

You can use a lambda expression, e.g:

sales['time_hour'] = sales.timestamp.apply(lambda x: x.hour)
Community
  • 1
  • 1
Bob Hannon
  • 201
  • 1
  • 2
16

You can try this:

sales['time_hour'] = pd.to_datetime(sales['timestamp']).dt.hour
Pika Supports Ukraine
  • 3,612
  • 10
  • 26
  • 42
TCO
  • 167
  • 1
  • 8
3

Since the quickest, shortest answer is in a comment (from Jeff) and has a typo, here it is corrected and in full:

sales['time_hour'] = pd.DatetimeIndex(sales['timestamp']).hour
Idiot Tom
  • 449
  • 4
  • 5
2

Now we can use:

sales['time_hour'] = sales['timestamp'].apply(lambda x: x.hour)
4b0
  • 21,981
  • 30
  • 95
  • 142
0

Here is a simple solution:

import pandas as pd
# convert the timestamp column to datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# extract hour from the timestamp column to create an time_hour column
df['time_hour'] = df['timestamp'].dt.hour
DINA TAKLIT
  • 7,074
  • 10
  • 69
  • 74
0

You could also create a function where, if needed, you could also extract month, year, etc. but 'timestamp' must be the index.

for i in range(len(sales)):
  position = sales.index[i]
  hour = position.hour
  month = position.month
  sales.loc[position, 'hour'] = hour
  sales.loc[position, 'month'] = month
Trrmis
  • 1