1

Here is my Dataframe printed with df.to_markdown())

type convesion time covert json data to convert process completed time taken
INFO 0 days 00:01:00 2023-03-09 18:56:57 23000 0 days 00:02:03 0 days 00:02:09
INFO 0 days 00:01:30 2023-03-09 19:00:01 30000 0 days 00:03:28 0 days 00:03:31
INFO 0 days 00:01:30 2023-03-09 19:00:01 50000 0 days 00:03:28 0 days 00:06:31
INFO 0 days 00:00:30 2023-03-09 19:00:01 1000 0 days 00:00:28 0 days 00:00:31

I want to get

  • Min data to convert its respective conversion time, process completed time and total time(conversion time +process
    completed),time taken

  • Max data to convert its respective conversion time, process completed time and total time(conversion time +process completed),time taken

  • Avg data to convert its respective conversion time, process completed time and total time(conversion time +process
    completed),time taken

i tried,

df.loc[df['data to convert']==data_min, 'convesion time']
df['time taken'] = pd.to_timedelta(df['time taken'])
df['time taken'].sum()

Am not able to get what am expecting.

Expected Output:

d Data to convert Conversion time Process completed total time(process+conversion) Time taken
MIN 1000 00:00:30 00:00:28 00:00:58 00:00:31
MAX 50000 00:01:30 00:03:28 00:04:58 00:06:31
AVG 25500 00:01:00 00:01:58 00:02:58 00:03:31
ditil
  • 89
  • 8
  • `Am not able to get what am expecting.` - What is expecting from sample data? – jezrael Mar 10 '23 at 08:15
  • i will add expected output give me a minute – ditil Mar 10 '23 at 08:16
  • OK, if change `data to convert` value `23000` to `23000000` what is `time taken` in ouput for max? – jezrael Mar 10 '23 at 08:32
  • it depends on processing i can take 10 minutes or 1 hour... based on GPU, CPU – ditil Mar 10 '23 at 08:36
  • I think from sample data `0 days 00:02:09` or `0 days 00:06:31` ? – jezrael Mar 10 '23 at 08:37
  • I will be higher one only... but this data i won'r calculate. it comes from server – ditil Mar 10 '23 at 08:43
  • Unfortunately not undertand each other. From sample data not clear, how is count `MAX` and `MN` ouput rows. MAX is count from row if `data to convert` is max? Or MAX is count from maximal of each column? – jezrael Mar 10 '23 at 08:45
  • MIN, MAX and Average will be based on "data to convert" – ditil Mar 10 '23 at 08:51
  • OK, so need first my solution. – jezrael Mar 10 '23 at 08:52
  • suppose data to convert value has 100, 2300, 23000 and 23000000 --> here min is 100 and will get time conversion, process time for the row 100 and max will be 23000000 and its respective time process, conversion time e.t.c – ditil Mar 10 '23 at 08:53

1 Answers1

1

Use:

#create default index
df = df.reset_index(drop=True)

#convert columns to timedeltas
cols = ['convesion time','process completed','time taken']
df[cols] = df[cols].apply(pd.to_timedelta)

#create new column
df['total time'] = df['convesion time'].add(df['process completed'])

#convert values to numeric
df['data to convert'] = df['data to convert'].astype(int)

#get rows with maximal and minimal values
df1 = (df.loc[df['data to convert'].agg(['idxmin','idxmax']), 
              ['data to convert'] + cols + ['total time']]
        )
df1.index = ['MIN','MAX']

#add mean row
df1.loc['AVG'] = df1.mean()
print (df1)
     data to convert  convesion time process completed      time taken  \
MIN           1000.0 0 days 00:00:30   0 days 00:00:28 0 days 00:00:31   
MAX          50000.0 0 days 00:01:30   0 days 00:03:28 0 days 00:06:31   
AVG          25500.0 0 days 00:01:00   0 days 00:01:58 0 days 00:03:31   

         total time  
MIN 0 days 00:00:58  
MAX 0 days 00:04:58  
AVG 0 days 00:02:58  

df2 = df1.unstack().to_frame().T
print (df2)
  data to convert                    convesion time                  \
              MIN      MAX      AVG             MIN             MAX   
0          1000.0  50000.0  26000.0 0 days 00:00:30 0 days 00:01:30   

                         process completed                  \
                     AVG               MIN             MAX   
0 0 days 00:01:07.500000   0 days 00:00:28 0 days 00:03:28   

                              time taken                  \
                     AVG             MIN             MAX   
0 0 days 00:02:21.750000 0 days 00:00:31 0 days 00:06:31   

                              total time                  \
                     AVG             MIN             MAX   
0 0 days 00:03:10.500000 0 days 00:00:58 0 days 00:04:58   

                          
                     AVG  
0 0 days 00:03:29.250000  

EDIT: If need all maximals and all minimal rows:

#create default index
df = df.reset_index(drop=True)

#convert columns to timedeltas
cols = ['convesion time','process completed','time taken']
df[cols] = df[cols].apply(pd.to_timedelta)

#create new column
df['total time'] = df['convesion time'].add(df['process completed'])

#convert values to numeric
df['data to convert'] = df['data to convert'].astype(int)

#get rows with maximal and minimal values
df1 = df[df['data to convert'].eq(df['data to convert'].max())]
df1.index = ['MAX'] * len(df1)

df2 = df[df['data to convert'].eq(df['data to convert'].min())]
df2.index = ['MIN'] * len(df2)
print (df2)

out = pd.concat([df1, df2])

#add mean row
out.loc['AVG'] = out.mean()
print (out)
  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, Can i write this into excel where input DF in one sheet and output DF into another sheet in same excel? – ditil Mar 10 '23 at 08:57
  • @ditil- Sure, use [this](https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas) – jezrael Mar 10 '23 at 09:00
  • df1 = (df.loc[df['data to convert'].agg(['idxmin','idxmax']), File "/apps/tools/python/lib/python3.7/site-packages/pandas/core/series.py", line 3974, in aggregate result, how = aggregate(self, func, *args, **kwargs) File "/apps/tools/python/lib/python3.7/site-packages/pandas/core/aggregation.py", line 586, in aggregate return agg_list_like(obj, arg, _axis=_axis), None File "/apps/tools/python/lib/python3.7/site-packages/pandas/core/aggregation.py", line 672, in agg_list_like raise ValueError("no results") ValueError: no results – ditil Mar 10 '23 at 09:01
  • am getting error @jezrael – ditil Mar 10 '23 at 09:01
  • @ditil - What is `print (df['data to convert'].dtype)` ? – jezrael Mar 10 '23 at 09:04
  • Its printing -- > object – ditil Mar 10 '23 at 09:07
  • @ditil - It means there are non numeric data. check [this](https://stackoverflow.com/questions/15891038/change-column-type-in-pandas) – jezrael Mar 10 '23 at 09:08
  • I did this `df['data to convert'] = df['data to convert'].astype(int) df1 = (df.loc[df['data to convert'].agg(['idxmin','idxmax']), ['data to convert'] + cols + ['total time']] .set_axis(['MIN','MAX']))` am getting error --> f"Length mismatch: Expected axis has {old_len} elements, new " ValueError: Length mismatch: Expected axis has 8 elements, new values have 2 elements – ditil Mar 10 '23 at 09:10
  • @ditil - Can you change `.set_axis(['MIN','MAX']))` to `.set_axis(['MIN','MAX'], inplace=False, axis=0))` ? – jezrael Mar 10 '23 at 09:12
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/252437/discussion-between-ditil-and-jezrael). – ditil Mar 10 '23 at 09:13