How to find min/max values in columns of csv files into JSON like format in Python?

Question

I want to take in a few csv files, each of which would looks like this:

                column1     column2        column3       
18/05/16 11:19  143501      148686          40644           
18/05/16 11:34  144070      144710          79585           
18/05/16 11:49  134070      144713          79588           
18/05/16 12:03  144070      144716          80591           
18/05/16 12:13  144070      144733          79608           
18/05/16 12:23  154790      144733          79608           
18/05/16 12:19  144070      125753          79628           
18/05/16 12:34  144070      144757          79632

I want to obtain the max and min value for each column in each csv file (excluding the date/time column) and put the data into a structure something like this:

{
    'csvfile1' :[{

        "column1": [{
            "max": "154790"
            "min": "134070"
            }],
        "column2": [{
            "max": "148686"
            "min": "125753"
            }],
        "column3": [{
            "max": "80591"
            "min": "40644"
            }]
        }],
    'csvfile2' :[{
        <same type of information here>
    }],
    'csvfile3' :[{
        <same type of information here>
    }]
}

This is my script so far:

max_min_data={}
for name in csvfile_names:
    f = open(name+'.csv', "r")

    #Only want to check columns 1 to 3 not the date/time
    for number in range(1,3):
        f.seek(0)
        column_names = next(f) next(f) # Skip header
        max_value = max(int(row[number]) for row in csv.reader(f))
        min_value = min(int(row[number]) for row in csv.reader(f))

        max_min_data[name]=[column_names[number]]
        max_min_data[name][column_names[number]]=['max','min']
        max_min_data[name][column_names[number]]['max']=max_value
        max_min_data[name][column_names[number]]['min']=min_value

        print "MAX:"
        print '\t\t'+str(max_value)
        print "MIN:"
        print '\t\t'+str(min_value)

^ I think that meant "show the output you get that you don't want." — Tadhg McDonald-Jensen, May 18 '16 at 15:14
Make sure `row[number]` is a number (integer or float) and not a string. — oystein, May 18 '16 at 15:14
comparing strings doesn't do the same thing as comparing numbers, it checks in terms of the first character then second and so on, so `max('123', '23') -> '23'` since `'2'` is greater then `'1'` — Tadhg McDonald-Jensen, May 18 '16 at 15:15
Possible duplicate of [compare two python strings that contain numbers](http://stackoverflow.com/questions/6062973/compare-two-python-strings-that-contain-numbers) — Tadhg McDonald-Jensen, May 18 '16 at 15:18
@ozgur Thank you that solved the max issue but what would be the best method to put that data into the format I have above? — Catherine, May 18 '16 at 15:19
Can you show us the code that converts the given csv to the above dictionary? — Ozgur Vatansever, May 18 '16 at 15:21
@ozgur I have updated my code above to include the code that converts the given csv to the above dictionary — Catherine, May 18 '16 at 15:31
@Catherine Define "best method"? Fastest, cleanest, least number of lines, no dependencies? — oystein, May 18 '16 at 15:40
@oystein My current method as shown in my code isn't working, by "best" I meant "cleanest" or "most pythonic way" of creating the dictionary structure described above — Catherine, May 18 '16 at 15:44

score 2 · Answer 1 · answered May 18 '16 at 15:17

Have you tried this :

max_value = max(int(row[number]) for row in csv.reader(f))

Indeed, when you parse the CSV file, the reader assumes these values are strings, you are therefore comparing them by string comparison, and not getting the actual max value.

score 1 · Accepted Answer · answered May 18 '16 at 15:55

If using pandas is allowed, this is very simple. You need to load the csv into a dataframe, use the describe method, select only the 'min' and 'max' rows, and then create a JSON object from the dataframe. The important thing here is that pandas already provides all of that functionality.

Load your csv file as a dataframe with pandas.read_csv. Assuming you have it as a dataframe df like this:

                column1  column2  column3
18/05/16 11:19   143501   148686    40644
         11:34   144070   144710    79585
         11:49   134070   144713    79588
         12:03   144070   144716    80591
         12:13   144070   144733    79608
         12:23   154790   144733    79608
         12:19   144070   125753    79628
         12:34   144070   144757    79632

Then you use desc = df.describe() to generate a dataframe desc like:

            column1        column2       column3
count       8.000000       8.000000      8.000000
mean   144088.875000  142850.125000  74860.500000
std      5545.130565    7045.836226  13829.840615
min    134070.000000  125753.000000  40644.000000
25%    143927.750000  144712.250000  79587.250000
50%    144070.000000  144724.500000  79608.000000
75%    144070.000000  144739.000000  79629.000000
max    154790.000000  148686.000000  80591.000000

If you subset the rows of the dataframe above with desc.loc[['min','max']], you get:

      column1  column2  column3
min   134070   125753    40644
max   154790   148686    80591

Then, you can just get it as a JSON object with a call to to_json, like desc.loc[['min','max']].to_json() which can either save directly to a file or generate a string (take a look at the docs).

From there, composing multiple such processes and writing to file should be straightforward.

How to find min/max values in columns of csv files into JSON like format in Python?

2 Answers2