1

tl; dr

Is it possible to convert all float values in this DataFrame pivot table from seconds to intervals in the HH:MM:SS format?

Name        Student A  Tutor
Date
2021-04-12       52.0   86.0
2021-04-13       13.0  113.0
2021-04-14        NaN   34.0
Average          32.5   77.6

Desired end result:

Name         Student A         Tutor
Date
2021-04-12    00:00:52      00:01:26
2021-04-13    00:00:13      00:01:53
2021-04-14                  00:00:34
Average       00:00:32      00:01:17

Caveats:

  • Tutor and Student A columns are dynamic so I cannot reference them by names. In fact, there might be multiple students in the table.
  • I have had no luck with applying pd.to_timedelta as it prefixes everything with "0 days".
  • The table also has the total column and row but I've removed it for clarity.

Long read

This is how the pivot table has been built:

import pandas as pd

df = pd.DataFrame({
    'Date': ['2021-04-12', '2021-04-12', '2021-04-13', '2021-04-13', '2021-04-14'],
    'Name': ['Tutor', 'Student A', 'Student A', 'Tutor', 'Tutor'],
    'duration_seconds': [86, 52, 13, 113, 34]
})

df looks like this:

         Date       Name  duration_seconds
0  2021-04-12      Tutor                86
1  2021-04-12  Student A                52
2  2021-04-13  Student A                13
3  2021-04-13      Tutor               113
4  2021-04-14      Tutor                34

Grouping it

grouped = df.groupby(['Name', 'Date']).sum()

yields:

                       duration_seconds
Name        Date
Student A   2021-04-12               52
            2021-04-13               13
Tutor       2021-04-12               86
            2021-04-13              113
            2021-04-14               34

Converting data to the desired format

pivoted = grouped.pivot_table(
    values='duration_seconds',
    index='Date',
    columns='Name'
)

yields:

Name        Student A  Tutor
Date
2021-04-12       52.0   86.0
2021-04-13       13.0  113.0
2021-04-14        NaN   34.0

Adding the averages row

pivoted.loc['Average'] = pivoted.mean()

yields:

Name        Student A       Tutor
Date
2021-04-12       52.0   86.000000
2021-04-13       13.0  113.000000
2021-04-14        NaN   34.000000
Average          32.5   77.666667

This is how far I've got with this. The last step would be to convert all values under the student and tutor columns to intervals in the HH:MM:SS format.

This would be a good way to achieve it had it not been prefixing everything with 0 days :

pivoted.iloc[:].apply(pd.to_timedelta, unit='s')
Chris Kobrzak
  • 1,044
  • 14
  • 20

1 Answers1

1
def strfdelta(tdelta, fmt):
    d = {"days": tdelta.days}
    d["hours"], rem = divmod(tdelta.seconds, 3600)
    d["minutes"], d["seconds"] = divmod(rem, 60)
    return fmt.format(**d)


df = df.apply(
    lambda x: [
        strfdelta(
            pd.Timedelta(seconds=v),
            "{hours:02d}:{minutes:02d}:{seconds:02d}",
        )
        if pd.notna(v)
        else ""
        for v in x
    ],
)
print(df)

Prints:

           Student A     Tutor
2021-04-12  00:00:52  00:01:26
2021-04-13  00:00:13  00:01:53
2021-04-14            00:00:34
Average     00:00:32  00:01:17

The strfdelta function is from Formatting timedelta objects

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91