Extract min, max and avg using pandas DF

Question

Here is my Dataframe printed with df.to_markdown())

type	convesion time	covert json	data to convert	process completed	time taken
INFO	0 days 00:01:00	2023-03-09 18:56:57	23000	0 days 00:02:03	0 days 00:02:09
INFO	0 days 00:01:30	2023-03-09 19:00:01	30000	0 days 00:03:28	0 days 00:03:31
INFO	0 days 00:01:30	2023-03-09 19:00:01	50000	0 days 00:03:28	0 days 00:06:31
INFO	0 days 00:00:30	2023-03-09 19:00:01	1000	0 days 00:00:28	0 days 00:00:31

I want to get

Min data to convert its respective conversion time, process completed time and total time(conversion time +process
completed),time taken
Max data to convert its respective conversion time, process completed time and total time(conversion time +process completed),time taken
Avg data to convert its respective conversion time, process completed time and total time(conversion time +process
completed),time taken

i tried,

df.loc[df['data to convert']==data_min, 'convesion time']
df['time taken'] = pd.to_timedelta(df['time taken'])
df['time taken'].sum()

Am not able to get what am expecting.

Expected Output:

d	Data to convert	Conversion time	Process completed	total time(process+conversion)	Time taken
MIN	1000	00:00:30	00:00:28	00:00:58	00:00:31
MAX	50000	00:01:30	00:03:28	00:04:58	00:06:31
AVG	25500	00:01:00	00:01:58	00:02:58	00:03:31

`Am not able to get what am expecting.` - What is expecting from sample data? — jezrael, Mar 10 '23 at 08:15
OK, if change `data to convert` value `23000` to `23000000` what is `time taken` in ouput for max? — jezrael, Mar 10 '23 at 08:32
it depends on processing i can take 10 minutes or 1 hour... based on GPU, CPU — ditil, Mar 10 '23 at 08:36
I think from sample data `0 days 00:02:09` or `0 days 00:06:31` ? — jezrael, Mar 10 '23 at 08:37
I will be higher one only... but this data i won'r calculate. it comes from server — ditil, Mar 10 '23 at 08:43
Unfortunately not undertand each other. From sample data not clear, how is count `MAX` and `MN` ouput rows. MAX is count from row if `data to convert` is max? Or MAX is count from maximal of each column? — jezrael, Mar 10 '23 at 08:45
suppose data to convert value has 100, 2300, 23000 and 23000000 --> here min is 100 and will get time conversion, process time for the row 100 and max will be 23000000 and its respective time process, conversion time e.t.c — ditil, Mar 10 '23 at 08:53

jezrael · Accepted Answer · 2023-03-10T11:57:44.970

Use:

#create default index
df = df.reset_index(drop=True)

#convert columns to timedeltas
cols = ['convesion time','process completed','time taken']
df[cols] = df[cols].apply(pd.to_timedelta)

#create new column
df['total time'] = df['convesion time'].add(df['process completed'])

#convert values to numeric
df['data to convert'] = df['data to convert'].astype(int)

#get rows with maximal and minimal values
df1 = (df.loc[df['data to convert'].agg(['idxmin','idxmax']), 
              ['data to convert'] + cols + ['total time']]
        )
df1.index = ['MIN','MAX']

#add mean row
df1.loc['AVG'] = df1.mean()
print (df1)
     data to convert  convesion time process completed      time taken  \
MIN           1000.0 0 days 00:00:30   0 days 00:00:28 0 days 00:00:31   
MAX          50000.0 0 days 00:01:30   0 days 00:03:28 0 days 00:06:31   
AVG          25500.0 0 days 00:01:00   0 days 00:01:58 0 days 00:03:31   

         total time  
MIN 0 days 00:00:58  
MAX 0 days 00:04:58  
AVG 0 days 00:02:58

df2 = df1.unstack().to_frame().T
print (df2)
  data to convert                    convesion time                  \
              MIN      MAX      AVG             MIN             MAX   
0          1000.0  50000.0  26000.0 0 days 00:00:30 0 days 00:01:30   

                         process completed                  \
                     AVG               MIN             MAX   
0 0 days 00:01:07.500000   0 days 00:00:28 0 days 00:03:28   

                              time taken                  \
                     AVG             MIN             MAX   
0 0 days 00:02:21.750000 0 days 00:00:31 0 days 00:06:31   

                              total time                  \
                     AVG             MIN             MAX   
0 0 days 00:03:10.500000 0 days 00:00:58 0 days 00:04:58   

                          
                     AVG  
0 0 days 00:03:29.250000

EDIT: If need all maximals and all minimal rows:

#create default index
df = df.reset_index(drop=True)

#convert columns to timedeltas
cols = ['convesion time','process completed','time taken']
df[cols] = df[cols].apply(pd.to_timedelta)

#create new column
df['total time'] = df['convesion time'].add(df['process completed'])

#convert values to numeric
df['data to convert'] = df['data to convert'].astype(int)

#get rows with maximal and minimal values
df1 = df[df['data to convert'].eq(df['data to convert'].max())]
df1.index = ['MAX'] * len(df1)

df2 = df[df['data to convert'].eq(df['data to convert'].min())]
df2.index = ['MIN'] * len(df2)
print (df2)

out = pd.concat([df1, df2])

#add mean row
out.loc['AVG'] = out.mean()
print (out)

Thanks, Can i write this into excel where input DF in one sheet and output DF into another sheet in same excel? — ditil, Mar 10 '23 at 08:57
@ditil- Sure, use [this](https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas) — jezrael, Mar 10 '23 at 09:00
df1 = (df.loc[df['data to convert'].agg(['idxmin','idxmax']), File "/apps/tools/python/lib/python3.7/site-packages/pandas/core/series.py", line 3974, in aggregate result, how = aggregate(self, func, *args, **kwargs) File "/apps/tools/python/lib/python3.7/site-packages/pandas/core/aggregation.py", line 586, in aggregate return agg_list_like(obj, arg, _axis=_axis), None File "/apps/tools/python/lib/python3.7/site-packages/pandas/core/aggregation.py", line 672, in agg_list_like raise ValueError("no results") ValueError: no results — ditil, Mar 10 '23 at 09:01
@ditil - It means there are non numeric data. check [this](https://stackoverflow.com/questions/15891038/change-column-type-in-pandas) — jezrael, Mar 10 '23 at 09:08
I did this `df['data to convert'] = df['data to convert'].astype(int) df1 = (df.loc[df['data to convert'].agg(['idxmin','idxmax']), ['data to convert'] + cols + ['total time']] .set_axis(['MIN','MAX']))` am getting error --> f"Length mismatch: Expected axis has {old_len} elements, new " ValueError: Length mismatch: Expected axis has 8 elements, new values have 2 elements — ditil, Mar 10 '23 at 09:10
@ditil - Can you change `.set_axis(['MIN','MAX']))` to `.set_axis(['MIN','MAX'], inplace=False, axis=0))` ? — jezrael, Mar 10 '23 at 09:12
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/252437/discussion-between-ditil-and-jezrael). — ditil, Mar 10 '23 at 09:13

Extract min, max and avg using pandas DF

1 Answers1