9

Following in the spirit of this answer, I attempted the following to convert a DataFrame column of datetimes to a column of seconds since the epoch.

df['date'] = (df['date']+datetime.timedelta(hours=2)-datetime.datetime(1970,1,1))
df['date'].map(lambda td:td.total_seconds())

The second command causes the following error which I do not understand. Any thoughts on what might be going on here? I replaced map with apply and that didn't help matters.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-99-7123e823f995> in <module>()
----> 1 df['date'].map(lambda td:td.total_seconds())

/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in map(self, arg, na_action)
   1932             return self._constructor(new_values, index=self.index).__finalize__(self)
   1933         else:
-> 1934             mapped = map_f(values, arg)
   1935             return self._constructor(mapped, index=self.index).__finalize__(self)
   1936 

/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:43628)()

<ipython-input-99-7123e823f995> in <lambda>(td)
----> 1 df['date'].map(lambda td:td.total_seconds())

AttributeError: 'float' object has no attribute 'total_seconds'
Community
  • 1
  • 1
Chris
  • 3,109
  • 7
  • 29
  • 39
  • It seems the 'date' column may not be a datetime64 in the first place? – Zeugma Oct 23 '13 at 22:31
  • 1
    Does the column contain any missing values? Missing values usually cause pandas Series to be cast to floats, causing weirdness when you try to interpret them as datetimes. – Abe Oct 23 '13 at 23:04
  • @Abe Indeed you were right. Turns out there is some missing data in some of the records. Not what I anticipated... Ack. – Chris Oct 24 '13 at 01:15

2 Answers2

15

Update:

In 0.15.0 Timedeltas became a full-fledged dtype.

So this becomes possible (as well as the methods below)

In [45]: s = Series(pd.timedelta_range('1 day',freq='1S',periods=5))                         

In [46]: s.dt.components
Out[46]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        0             0             0            0
1     1      0        0        1             0             0            0
2     1      0        0        2             0             0            0
3     1      0        0        3             0             0            0
4     1      0        0        4             0             0            0

In [47]: s.astype('timedelta64[s]')
Out[47]: 
0    86400
1    86401
2    86402
3    86403
4    86404
dtype: float64

Original Answer:

I see that you are on master (and 0.13 is coming out very shortly), so assuming you have numpy >= 1.7. Do this. See here for the docs (this is frequency conversion)

In [5]: df = DataFrame(dict(date = date_range('20130101',periods=10)))

In [6]: df
Out[6]: 
                 date
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00

In [7]: df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)
Out[7]: 
0   15706 days, 02:00:00
1   15707 days, 02:00:00
2   15708 days, 02:00:00
3   15709 days, 02:00:00
4   15710 days, 02:00:00
5   15711 days, 02:00:00
6   15712 days, 02:00:00
7   15713 days, 02:00:00
8   15714 days, 02:00:00
9   15715 days, 02:00:00
Name: date, dtype: timedelta64[ns]

In [9]: (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)) / np.timedelta64(1,'s')
Out[9]: 
0    1357005600
1    1357092000
2    1357178400
3    1357264800
4    1357351200
5    1357437600
6    1357524000
7    1357610400
8    1357696800
9    1357783200
Name: date, dtype: float64

The contained values are np.timedelta64[ns] objects, they don't have the same methods as timedelta objects, so no total_seconds().

In [10]: s = (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1))

In [11]: s[0]
Out[11]: numpy.timedelta64(1357005600000000000,'ns')

You can astype them to int, and you get back a ns unit.

In [12]: s[0].astype(int)
Out[12]: 1357005600000000000

You can do this as well (but only on an individual unit element).

In [18]: s[0].astype('timedelta64[s]')
Out[18]: numpy.timedelta64(1357005600,'s')
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • 1
    Perfect Jeff, thank you! Once I got rid of the rows with missing data, I could directly apply this to the new problem. ;-) – Chris Oct 24 '13 at 01:26
  • you don't need to remove missing values (NaT); they will be returned as nan – Jeff Oct 24 '13 at 01:29
  • I don't have time right now to edit (and re-verify) the answer, but in fact version 0.13 now allows 'astype' operations to be applied to an entire Series. See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-deltas-conversions, or http://pandas.pydata.org/pandas-docs/stable/whatsnew.html and do a find-on-page for 'astype'. – TimStaley Mar 11 '14 at 19:01
  • I'm searching for a method to convert `Series` of `Timedelta`s to `seconds` for plotting. Your approach with `/ pd.Timedelta(seconds=1)` is a nice one. Stiil, I wonder if there is a better method. What do you think? – soupault May 22 '15 at 14:29
  • 2
    this answer actually is pretty outdated; Timedeltas became fully fledged types in 0.15.0. I will add another method. – Jeff May 22 '15 at 14:41
1

Since recent versions of Pandas, you can do:

import pandas as pd

# create a dataframe from 2023-05-06 to 2023-06-04
df = pd.DataFrame({'date': pd.date_range('2023-05-26', periods=10, freq='D')})

df['timestamp'] = (df['date'].add(pd.DateOffset(hours=2))  # add hour offset
                             .sub(pd.Timestamp(0))  # subtract 1970-1-1
                             .dt.total_seconds()  # extract total of seconds
                             .astype(int))  # downcast float64 to int64

Output:

>>> df
        date   timestamp
0 2023-05-26  1685066400
1 2023-05-27  1685152800
2 2023-05-28  1685239200
3 2023-05-29  1685325600
4 2023-05-30  1685412000
5 2023-05-31  1685498400
6 2023-06-01  1685584800
7 2023-06-02  1685671200
8 2023-06-03  1685757600
9 2023-06-04  1685844000

The key is to subtract the origin (pd.Timestamp(0)) to each dates (DatetimeIndex) then use the dt accessor to extract from the result (TimedeltaIndex) the number of seconds. You can also downcast the numeric result (float64 to int64).

Corralien
  • 109,409
  • 8
  • 28
  • 52