1

I have a df ,you can have it by copy and run the following code:

import pandas as pd
from io import StringIO

df = """
 b_id          duration1                  duration2                          user
 
 366           NaN                        38 days 22:05:06.807430            Test
 367           0 days 00:00:05.285239     NaN                                Test
 368           NaN                        NaN                                Test
 371           NaN                        NaN                                Test
 378           NaN                        451 days 14:59:28.830482           Test
 384           28 days 21:05:16.141263     0 days 00:00:44.999706            Test
 
 466           NaN                        38 days 22:05:06.807430            Tom
 467           0 days 00:00:05.285239     NaN                                Tom
 468           NaN                        NaN                                Tom
 471           NaN                        NaN                                Tom
 478           NaN                        451 days 14:59:28.830482           Tom
 484           28 days 21:05:16.141263     0 days 00:00:44.999706            Tom

"""
df= pd.read_csv(StringIO(df.strip()), sep='\s\s+', engine='python')
df

My question is ,how can I get the mean value of each duration of each user ?

The output should something like this(the mean value is a fake one for sample ,not the exactly mean value):

mean_duration1             mean_duration2                     user

8 days 22:05:06.807430    3 days 22:05:06.807430              Test
2 days 00:00:05.285239    4 days 22:05:06.807430              Tom
William
  • 3,724
  • 9
  • 43
  • 76

1 Answers1

1

You can use:

out = (df
   .set_index('user')
   .filter(like='duration')
   .apply(pd.to_timedelta)
   .groupby(level=0).mean()
   .reset_index()
 )

Output:

   user               duration1                duration2
0  Test 14 days 10:32:40.713251 163 days 12:21:46.879206
1   Tom 14 days 10:32:40.713251 163 days 12:21:46.879206
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    thanks mozway,you are a master! – William Dec 21 '22 at 20:14
  • Thank you very much for your answer, what if I have another column in the old dataframe like 'number of cases', I found that this column will gone ,if run like this way – William Dec 22 '22 at 18:57
  • What type is this other column? Can't you add it as grouper with "user"? – mozway Dec 22 '22 at 19:00
  • It is a umber,I tired out = (df .set_index(['user','number of cases']) .filter(like='duration') .apply(pd.to_timedelta) .groupby(level=0).mean() .reset_index() ) but it not work – William Dec 22 '22 at 19:03
  • I opened a new question of this,can you help me ,thanks!https://stackoverflow.com/questions/74893111/pandas-column-lost-after-getting-mean-value-of-time-duration – William Dec 22 '22 at 19:35