4

I have a df with one column displaying time in seconds. I'd like to convert those to hh:mm or hh:mm:ss.

If the time goes over standard 24hr time I'd still like it to be in hh:mm:ss. Not 'n' days hh:mm:ss.

To provide an example:

import pandas as pd
import numpy as np
import datetime

ts1 = ['21000', np.nan, '40000', np.nan, '49000', '100000']
ts2 = [0, 2, 'yy', 3, 'yy', 'yy']
ts3 = [0, 2, np.nan, 3, 4, np.nan]
d =  {'X': ts1, 'Y': ts2, 'Z': ts3}
df = pd.DataFrame(data=d)

Output:

        X   Y    Z
0   21000   0  0.0
1     NaN   2  2.0
2   40000  yy  NaN
3     NaN   3  3.0
4   49000  yy  4.0
5  100000  yy  NaN

I can perform this on a single string using:

t = str(datetime.timedelta(seconds=21000))

Output t:

5:50:00    

But how would I pass the same function to an entire column?

#t_col = str(datetime.timedelta(seconds=df['ts1']))

Intended Output:

          X   Y    Z
0   5:50:00   0  0.0
1       NaN   2  2.0
2  11:06:40  yy  NaN
3       Nan   3  3.0
4  13:36:40  yy  4.0
5  27:46:40  yy  NaN

2 Answers2

0

A more step-by-step approach

First, let's create our new column and be rid of the pesky NaN values

In [156]: df['new_column'] = df.X.fillna(0)

In [157]: df
Out[157]:
       X   Y    Z new_column
0  21000   0  0.0      21000
1    NaN   2  2.0          0
2  40000  yy  NaN      40000
3    NaN   3  3.0          0
4  49000  yy  4.0      49000
5  80000  yy  NaN      80000

Then we can deal with creating the deltas by using a lambda function based on the code you had written. Note that we have to cast the values of the new column to ints for the seconds parameter of timedelta.

In [158]: df['new_column'] = df.apply(lambda x: datetime.timedelta(seconds=int(x['new_column'])), axis=1)

In [159]: df
Out[159]:
       X   Y    Z new_column
0  21000   0  0.0   05:50:00
1    NaN   2  2.0   00:00:00
2  40000  yy  NaN   11:06:40
3    NaN   3  3.0   00:00:00
4  49000  yy  4.0   13:36:40
5  80000  yy  NaN   22:13:20

Using a one-liner

Building on what we wrote before, we need to eliminate the NaNs and then convert the entire series firstly to ints and then to timedeltas.

In [173]: df['td'] = pd.to_timedelta(pd.to_numeric(df.X.fillna(0)), unit='s')

In [174]: df
Out[174]:
       X   Y    Z new_column       td
0  21000   0  0.0   05:50:00 05:50:00
1    NaN   2  2.0   00:00:00 00:00:00
2  40000  yy  NaN   11:06:40 11:06:40
3    NaN   3  3.0   00:00:00 00:00:00
4  49000  yy  4.0   13:36:40 13:36:40
5  80000  yy  NaN   22:13:20 22:13:20

This approach should be faster as apply is quite slow

As per your comment, to be consistent with the NaNs, you can use this

df['td'] = df.apply(lambda x: x['td'] if x['X'] is not np.NaN else None, axis=1)
aydow
  • 3,673
  • 2
  • 23
  • 40
  • Also, can we remove the 00:00:00 before or after converting? –  Jun 20 '18 at 05:19
  • @PeterJames123, you're redefining the question. this is beyond the scope of what you initially asked – aydow Jun 20 '18 at 05:26
  • Not really? The NaN's were fine. Now I need to remove those. What if the time is actually 00:00:00. I won't be able to differentiate. –  Jun 20 '18 at 05:32
  • i was referring to the comment that you deleted where you were asking me to code for times > 24 hours. your first question is within the scope and I have updated my response to answer it – aydow Jun 20 '18 at 06:09
  • Aww sorry. I could add those. All good. I'll just do a .replace –  Jun 20 '18 at 06:11
0

Using pandas.to_timedelta followed by some messy string formatting to convert days into hours:

def formatter(x):
    x = str(x)
    return str(int(x[-8:-6])+int(x.split('days')[0])*24).zfill(2) + x[-6:]

df['TD'] = pd.to_timedelta(df['X'].fillna(0).astype(int), unit='s')\
             .apply(formatter)

print(df)

        X   Y    Z        TD
0   21000   0  0.0  05:50:00
1     NaN   2  2.0  00:00:00
2   40000  yy  NaN  11:06:40
3     NaN   3  3.0  00:00:00
4   49000  yy  4.0  13:36:40
5  100000  yy  NaN  27:46:40
jpp
  • 159,742
  • 34
  • 281
  • 339