24

I need a datetime column in seconds, everywhere (including the docs) is saying that I should use Series.dt.total_seconds() but it can't find the function. I'm assuming I have the wrong version of something but I don't...

pip freeze | grep pandas
pandas==0.20.3

python --version
Python 3.5.3

This is all within a virtualenv that has worked without fault for a long time, and the other Series.dt functions work. Here's the code:

from pandas import Series
from datetime import datetime

s = Series([datetime.now() for _ in range(10)])

0   2017-08-25 15:55:25.079495
1   2017-08-25 15:55:25.079504
2   2017-08-25 15:55:25.079506
3   2017-08-25 15:55:25.079508
4   2017-08-25 15:55:25.079509
5   2017-08-25 15:55:25.079510
6   2017-08-25 15:55:25.079512
7   2017-08-25 15:55:25.079513
8   2017-08-25 15:55:25.079514
9   2017-08-25 15:55:25.079516
dtype: datetime64[ns]

s.dt
<pandas.core.indexes.accessors.DatetimeProperties object at 0x7f5a686507b8>

s.dt.minute
0    55
1    55
2    55
3    55
4    55
5    55
6    55
7    55
8    55
9    55
dtype: int64

s.dt.total_seconds()
AttributeError: 'DatetimeProperties' object has no attribute 'total_seconds'

I've also tested this on a second machine and get the same result. Any ideas what I'm missing?

JakeCowton
  • 1,374
  • 5
  • 15
  • 35

4 Answers4

33

total_seconds is a member of timedelta not datetime

Hence the error

You maybe be wanting dt.second

This returns the second component which is different to total_seconds

So you need to perform some kind of arithmetic operation such as deleting something against this in order to generate a series of timedeltas, then you can do dt.total_seconds

Example:

In[278]:
s = s - pd.datetime.now()
s

Out[278]: 
0   -1 days +23:59:46.389639
1   -1 days +23:59:46.389639
2   -1 days +23:59:46.389639
3   -1 days +23:59:46.389639
4   -1 days +23:59:46.389639
5   -1 days +23:59:46.389639
6   -1 days +23:59:46.389639
7   -1 days +23:59:46.389639
8   -1 days +23:59:46.389639
9   -1 days +23:59:46.389639
dtype: timedelta64[ns]

In[279]:
s.dt.total_seconds()

Out[279]: 
0   -13.610361
1   -13.610361
2   -13.610361
3   -13.610361
4   -13.610361
5   -13.610361
6   -13.610361
7   -13.610361
8   -13.610361
9   -13.610361
dtype: float64
EdChum
  • 376,765
  • 198
  • 813
  • 562
10

Actually I just realized you could just convert to integer (in case you want the total seconds)!

>>> df.time_column.astype(int)
0     1592294727721713000
1     1592294727650772000
2     1592294727682569000
3     1592294727712650000
PascalVKooten
  • 20,643
  • 17
  • 103
  • 160
7

Alternatively, if you really want to have seconds (since 1970 epoch), you can try this

import pandas as pd
from datetime import datetime
import time

df = pd.DataFrame({'datetime' : [datetime(2012, 11, 19, 12, 40, 10), 
                                datetime(2012, 11, 19, 12, 35, 10),
                                datetime(2012, 11, 19, 12, 30, 10),
                                datetime(2012, 11, 19, 12, 25, 10)
                                ]})
df['seconds'] = [time.mktime(t.timetuple()) for t in df.datetime]
df['back_to_date_time'] = [datetime.utcfromtimestamp(t) for t in df.seconds]

>>>>df
Out[2]: 
             datetime       seconds   back_to_date_time
0 2012-11-19 12:40:10  1.353325e+09 2012-11-19 11:40:10
1 2012-11-19 12:35:10  1.353325e+09 2012-11-19 11:35:10
2 2012-11-19 12:30:10  1.353325e+09 2012-11-19 11:30:10
3 2012-11-19 12:25:10  1.353324e+09 2012-11-19 11:25:10

or you can look here How can I convert a datetime object to milliseconds since epoch (unix time) in Python?

mortysporty
  • 2,749
  • 6
  • 28
  • 51
  • 2
    This is what I wanted. I ended up going for a quick fix and just subtracting 1st Jan 1970 and then using `total_seconds()` on the resulting `timedelta`. (I really don't like datetime64 format). Thanks though! – JakeCowton Aug 25 '17 at 17:14
0

From pandas.Series.dt.total_seconds, the document explains its usage:

Return total duration of each element expressed in seconds.

This method is available directly on TimedeltaArray, TimedeltaIndex and on Series containing timedelta values under the .dt namespace.

You can think of total duration as time difference between a start time and an end time. Timedeltas is just designed for this. It is an absolute time duration, aimed to represent differences in times.

So when you say you want to calculate the total seconds, you must make sure what is the duration. Is it the time between two datetimes or is it time units like 2 days, 1 hour?


Let's go back to your question, you have construct a datetime64 Series and you want to calculate the total seconds.

s = pd.Series(pd.date_range(start='2018-05-26 08:00', periods=3, freq='1h'))
print(s)

0   2018-05-26 08:00:00
1   2018-05-26 09:00:00
2   2018-05-26 10:00:00
dtype: datetime64[ns]

What's the duration? It doesn't make sense to calculate total seconds from a time like 2018-05-26 08:00:00. A time point doesn't even represent 1 second.


If you want to calculate the total seconds to the start of day, you can do

ts = pd.to_timedelta(s.dt.time.astype(str)).dt.total_seconds()
print(ts)

0    28800.0
1    32400.0
2    36000.0
dtype: float64

If you want to calculate the total seconds since Unix epoch (January 1, 1970, 00:00:00 (UTC)),

ts = s.values.astype(int)
print(ts)

[1527321600000000000 1527325200000000000 1527328800000000000]

If you want to calculate the total seconds from the start of year

ts = s.sub(pd.to_datetime('2018')).dt.total_seconds()
print(ts)

0    12556800.0
1    12560400.0
2    12564000.0
dtype: float64
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52