How to calculate min and max of a column for particular rows?

Question

I have a csv file as following:

    0  2  1  1    464  385  171    0:44:4   
    1  1  2  26    254  444  525    0:56:2   
    2  3  1  90    525  785  522    0:52:8   
    3  8  2  3    525  233  555    0:52:8  
    4  7  1  10    525  433  522    1:52:8   
    5  9  2  55     525  555  522    1:52:8   
    6  6  3  3   392  111  232    1:43:4   
    7  1  4  23    322  191  112    1:43:4   
    8  1  3  30    322  191  112    1:43:4   
    9  1  5   2   322  191  112    1:43:4   
    10  1  3  22   322  191  112    1:43:4   
    11  1  4  44   322  191  112    1:43:4   
    12  1  5  1   322  191  112    1:43:4   
    12  1  4  3    322  191  112    1:43:4   
    12  1  6  33    322  191  112    1:43:4   
    12  1  6  1    322  191  112    1:43:4 
    12  1  5  3    322  191  112    1:43:4   
    12  1  6  33    322  191  112    1:43:4   
     

    .
    .

Third column has numbers between 1 to 6. I want to read information of columns #4 and #5 for all the rows that have number 1 to 6 in the third columns and find the maximum and minmum amount for each row that has number 1 to 6 seprately. For example output like this:

Mix for row with 1: 1  
Max for row with 1: 90

Min for row with 2:  3
Max for row with 2:  55

and so on

I can plot the figure using following code. How to get summary statistics by group? What I'm looking for is to get multiple statistics for the same group like mean, min, max, number of each group in one call, is that doable?

import matplotlib.pyplot as plt
import csv
x= []
y= []

with open('mydata.csv','r') as csvfile:
    ap = csv.reader(csvfile, delimiter=',')
    for row in ap:
        x.append(int(row[2]))
        y.append(int(row[7]))

plt.scatter(x, y, color = 'g',s = 4, marker='o')
plt.show()

Does this answer your question? [Get statistics for each group (such as count, mean, etc) using pandas GroupBy?](https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby) — 9769953, Apr 22 '22 at 20:57
That's not a CSV, that's a TSV, tab-separated values, or space-separated values. There's not a single comma in there, so I'm not sure how `delimiter=','` would work on this file! — Marcus Müller, Apr 22 '22 at 20:58
Please review my question, I made some changes to the csv file part. — Happypumpkin pm, Apr 22 '22 at 21:25
then you pasted a different thing than the file you're dealing with. — Marcus Müller, Apr 22 '22 at 21:54
@MarcusMüller, I need to find out the min and max values for each row number ( for1 to 6) — Happypumpkin pm, Apr 22 '22 at 22:19

Timus · Accepted Answer · 2022-04-22T23:24:21.843

One easy way would be to use Pandas with read_csv(), .groupby() and .agg():

import pandas as pd

df = pd.read_csv("mydata.csv", header=None)

def min_max_avg(col):
    return (col.min() + col.max()) / 2
    
result = df[[2, 3, 4]].groupby(2).agg(["min", "max", "mean", min_max_avg])

Result:

    3                               4                             
  min max       mean min_max_avg  min  max        mean min_max_avg
2                                                                 
1   1  90  33.666667        45.5  464  525  504.666667       494.5
2   3  55  28.000000        29.0  254  525  434.666667       389.5
3   3  30  18.333333        16.5  322  392  345.333333       357.0
4   3  44  23.333333        23.5  322  322  322.000000       322.0
5   1   3   2.000000         2.0  322  322  322.000000       322.0
6   1  33  22.333333        17.0  322  322  322.000000       322.0

If you don't like that you could do it with pure Python, it's only a little bit more work:

import csv

data = {}
with open("mydata.csv", "r") as file:
    for row in csv.reader(file):
        dct = data.setdefault(row[2], {})
        for col in (3, 4):
            dct.setdefault(col, []).append(row[col])

min_str = "Min for group {} - column {}: {}"
max_str = "Max for group {} - column {}: {}"
for row in data:
    for col in (3, 4):
        print(min_str.format(row, col, min(data[row][col])))
        print(max_str.format(row, col, max(data[row][col])))

Result:

Min for group 1 - column 3: 1
Max for group 1 - column 3: 90
Min for group 1 - column 4: 464
Max for group 1 - column 4: 525
Min for group 2 - column 3: 26
Max for group 2 - column 3: 55
Min for group 2 - column 4: 254
Max for group 2 - column 4: 525
Min for group 3 - column 3: 22
Max for group 3 - column 3: 30
Min for group 3 - column 4: 322
Max for group 3 - column 4: 392
...

mydata.csv:

0,2,1,1,464,385,171,0:44:4
1,1,2,26,254,444,525,0:56:2
2,3,1,90,525,785,522,0:52:8
3,8,2,3,525,233,555,0:52:8
4,7,1,10,525,433,522,1:52:8
5,9,2,55,525,555,522,1:52:8
6,6,3,3,392,111,232,1:43:4
7,1,4,23,322,191,112,1:43:4
8,1,3,30,322,191,112,1:43:4
9,1,5,2,322,191,112,1:43:4
10,1,3,22,322,191,112,1:43:4
11,1,4,44,322,191,112,1:43:4
12,1,5,1,322,191,112,1:43:4
12,1,4,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4
12,1,6,1,322,191,112,1:43:4
12,1,5,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4

I was wondering if pandas can give us the value for (min+max)/2 as well? — Happypumpkin pm, Apr 22 '22 at 23:11
@Happypumpkinpm As far as I know Pandas hasn't this function readily available. So you have to define it. See the edit. — Timus, Apr 22 '22 at 23:25

How to calculate min and max of a column for particular rows?

1 Answers1