timedelta to string type in pandas dataframe

Question

I have a dataframe df and its first column is timedelta64

df.info():

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0    686 non-null timedelta64[ns]
1    686 non-null object
2    686 non-null object
3    686 non-null object
4    686 non-null object
5    686 non-null object

If I print(df[0][2]), for example, it will give me 0 days 05:01:11. However, I don't want the 0 days filed. I only want 05:01:11 to be printed. Could someone teaches me how to do this? Thanks so much!

any preference for the data type of the resulting column? – Optimesh Jun 29 '18 at 12:35 — Optimesh, Jun 29 '18 at 12:35
object type would be fine for me – Chenrui Su Jun 29 '18 at 12:38 — Chenrui Su, Jun 29 '18 at 12:38

jezrael · Accepted Answer · 2018-06-29T12:56:56.467

It is possible by:

df['duration1'] = df['duration'].astype(str).str[-18:-10]

But solution is not general, if input is 3 days 05:01:11 it remove 3 days too.

So solution working only for timedeltas less as one day correctly.

More general solution is create custom format:

N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) - 
                                 np.random.choice(rng, size=N)) })  

df['duration1'] = df['duration'].astype(str).str[-18:-10]

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

df['duration2'] = df['duration'].apply(f)
print (df)

         duration duration1  duration2
0 2 days 06:00:00  06:00:00   54:00:00
1 2 days 19:30:00  19:30:00   67:30:00
2 1 days 03:00:00  03:00:00   27:00:00
3 0 days 00:00:00  00:00:00    0:00:00
4 4 days 12:00:00  12:00:00  108:00:00
5 1 days 03:00:00  03:00:00   27:00:00
6 0 days 13:30:00  13:30:00   13:30:00
7 1 days 16:30:00  16:30:00   40:30:00
8 0 days 00:00:00  00:00:00    0:00:00
9 1 days 16:30:00  16:30:00   40:30:00

how to do it if we don't know the name of the columns? like when we create pivot table — cikatomo, Dec 03 '22 at 23:35
@cikatomo do you need apply for all columns? then use `df = df.applymap(f)` — jezrael, Dec 04 '22 at 05:30
@cikatomo - no, for `TimedeltaIindex` need `rename` like `df = df.rename(index = f)` — jezrael, Dec 05 '22 at 07:22

Simon G. · Answer 2 · 2018-11-28T01:07:34.957

Here's a short and robust version using apply():

df['timediff_string'] = df['timediff'].apply(
    lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
              if not pd.isnull(x) else ''
)

This leverages the components attribute of pandas Timedelta objects and also handles empty values (NaT).

If the timediff column does not contain pandas Timedelta objects, you can convert it:

df['timediff'] = pd.to_timedelta(df['timediff'])

score 3 · Answer 3 · answered Oct 15 '19 at 17:57

datetime.timedelta already formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta.

import pandas as pd
from datetime import timedelta

time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)

times = pd.Series([time_1, time_2])

# Times are converted to Numpy timedeltas.
print(times)

# Convert to string after converting to datetime.timedelta.
times = times.apply(
    lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))

print(times)

So, convert to a datetime.timedelta and then str (to prevent conversion back to numpy.timedelta) before printing.

3 days, 0:56:40
0:56:400

0   3 days 00:56:40
1   0 days 00:56:40
dtype: timedelta64[ns]

0    3 days, 0:56:40
1            0:56:40
dtype: object

I came here looking for answers to the same question, so I felt I should add further clarification. : )

score 2 · Answer 4 · answered Jun 18 '20 at 06:26

2

You can convert it into a Python timedelta, then to str and finally back to a Series:

pd.Series(df["duration"].dt.to_pytimedelta().astype(str), name="start_time")

answered Jun 18 '20 at 06:26

bryant1410

5,540
4
39
40

I tried this, but I get AttributeError: 'Timedelta' object has no attribute 'dt'. Is this answer up to date? – niid Feb 01 '22 at 08:28
Seems like your `df["duration"]` object is not a `pd.Series` but a `Timedelta`? If it's a `pd.Series`, it should have `dt`: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.to_pytimedelta.html – bryant1410 Feb 01 '22 at 15:00

Optimesh · Answer 5 · 2018-06-29T13:08:52.570

Given OP is ok with an object column (a little verbose):

def splitter(td):

  td = str(td).split(' ')[-1:][0]

  return td


df['split'] = df['timediff'].apply(splitter)

Basically we're taking the timedelta column, transforming the contents to a string, then splitting the string (creates a list) and taking the last item of that list, which would be the hh:mm:ss component.

Note that specifying ' ' for what to split by is redundant here.

Alternative one liner:

df['split2'] = df['timediff'].astype('str').str.split().str[-1]

which is very similar, but not very pretty IMHO. Also, the output includes milliseconds, which is not the case in the first solution. I'm not sure what the reason for that is (please comment if you do). If your data is big it might be worthwhile to time these different approaches.

Thanks! I think once convert the type to str, then everything is relatively easy by using split. — Chenrui Su, Jun 29 '18 at 14:22

score 1 · Answer 6 · answered Mar 05 '22 at 00:16

If you can be sure that your timedelta is less than a day, this might work. To do this in as few lines as possible, I convert the timedelta to a datetime by adding the unix epoch 0 and then using the now-datetime dt function to format the date format.

df['duration1'] = (df['duration'] + pd.to_datetime(0)).dt.strftime('%M:%S')

score 0 · Answer 7 · answered Nov 12 '20 at 10:06

If wou want to remove all nonzero components (not only days), you can do it like this:


def pd_td_fmt(td):
    import pandas as pd
    abbr = {'days': 'd', 'hours': 'h', 'minutes': 'min', 'seconds': 's', 'milliseconds': 'ms', 'microseconds': 'us',
        'nanoseconds': 'ns'}

    fmt = lambda td:"".join(f"{v}{abbr[k]}" for k, v in td.components._asdict().items() if v != 0)
    if isinstance(td, pd.Timedelta):
        return fmt(td)
    elif isinstance(td,pd.TimedeltaIndex):
        return td.map(fmt)
    else:
        raise ValueError

timedelta to string type in pandas dataframe

7 Answers7

Linked

Related