find max of a column in a csv file using python

Question

I am trying to find max of below colm in csv

list['1154293', '885773', '-448704', '563679', '555394', '631974', '957395', '1104047', '693464', '454932', '727272', '125016', '339251', '78523', '977084', '1158718', '332681', '-341227', '173826', '742611', '1189806', '607363', '-1172384', '587993', '295198', '-300390', '468995', '698452', '967828', '-454873', '375723', '1140526', '83836', '413189', '551363', '1195111', '657081', '66659', '803301', '-953301', '883934']

I ran the code i wrote

  for row in csvReader:


        Revenue.append(row[1])
        max_revenue=max(Revenue)
        print("max revenue"+str(max_revenue))

But somhow its not fetching max value , output am getting is

        max revenue 977084

Please advice ,

Because it's treating them as strings... you need to get the max of the integer values. `"9"` is greater than `"11"` if it's a string — TemporalWolf, Mar 07 '18 at 18:50
Basically a dupe of https://stackoverflow.com/questions/7368789/convert-all-strings-in-a-list-to-int — pault, Mar 07 '18 at 18:53
Apart from the issue with strings, don't put the `max()` inside the loop. — Matt Hall, Mar 07 '18 at 18:57
Possible duplicate of [Convert all strings in a list to int](https://stackoverflow.com/questions/7368789/convert-all-strings-in-a-list-to-int) — Abhisek Roy, Mar 07 '18 at 18:57
Why is this tagged as both python-3.x and python-2.7? Do you specifically need code that works with both? Or are you expecting some relevant difference between the two that you need us to take into account? — abarnert, Mar 07 '18 at 19:18
Possible duplicate of [Avoid lexicographic ordering of numerical values with Python min() max()](https://stackoverflow.com/questions/47294468/avoid-lexicographic-ordering-of-numerical-values-with-python-min-max) — physlexic, Mar 07 '18 at 19:22

abarnert · Accepted Answer · 2018-03-07T22:48:05.020

The problem here is that you're building a list of the column-1 strings, but then expecting to find the max as a number, not as a string.

You could fix that by building a list of the column-1 strings mapped to integers, as other answers show:

for row in csvReader:
    Revenue.append(int(row[1]))
max_revenue=max(Revenue)

But another way is to use a key function for max:

for row in csvReader:
    Revenue.append(row[1])
max_revenue = max(Revenue, key=int)

Even better, you can use the same idea to not need that whole separate Revenue list:

max_revenue_row = max(csvReader, key=lambda row: int(row[1]))

This means you get the whole original row, not just the integer value. So, if, say, column 2 is the username that goes with the revenue in column 1, you can do this:

max_revenue_row = max(csvReader, key=lambda row: int(row[1]))
best_salesman_name = max_revenue_row[2]

This also avoids building a whole extra giant list in memory; it just reads each row into memory one at a time and then discards them, and only remembers the biggest one.

Which is usually great, but it has one potential problem: if you actually need to scan the values two or more times instead of just once, the first time already consumed all the rows, so the second time won't find any. For example, this will raise an exception in the second call:

max_revenue_row = max(csvReader, key=lambda row: int(row[1]))
min_revenue_row = min(csvReader, key=lambda row: int(row[1]))

The ideal solution is to reorganize your code to only scan the rows once. For example, if you understand how min and max work, you could build your own min_and_max function that does both at the same time, and then use it like this:

min_revenue_row, max_revenue_row = 
    min_and_max(csvReader, key=lambda row: int(row[1]))

But sometimes that's not possible, or at least not possible in a way you can figure out how to write readably. I'll assume you don't know how to write min_and_max. So, what can you do?

You have two less than ideal, but often still acceptable, options: Either read the entire file into memory, or read the file multiple times. Here's both.

rows = list(csvReader) # now it's in memory, so we can reuse it
max_revenue_row = max(rows, key=lambda row: int(row[1]))
min_revenue_row = min(rows, key=lambda row: int(row[1]))

with open(csvpath) as f:
    csvReader = csv.reader(f)
    max_revenue_row = max(csvReader, key=lambda row: int(row[1]))
with open(csvpath) as f:
    # whole new reader, so it doesn't matter that we used up the first
    csvReader = csv.reader(f)
    min_revenue_row = min(csvReader, key=lambda row: int(row[1]))

In your case, if the CSV file is as small at it seems, it doesn't really matter that much, but I'd probably do the first one.

Also can u reccomend best online tutorials for beginners like me in pyhton — user1592147, Mar 07 '18 at 19:28
@user1592147 I have no idea what tutorials are good nowadays, but I'll bet the python-list mailing list (either searching the archives, or joining and asking) is a good place to find that information. — abarnert, Mar 07 '18 at 19:32
i included the above code iand execute , am getting error " '>' not supported between instances of 'function' and '_csv.reader'"...any idea why — user1592147, Mar 07 '18 at 19:33
@user1592147 Oops, I forgot the `key=` in one version; fixed. — abarnert, Mar 07 '18 at 19:44
Can we do similar min function instead of max , to find minimum of values — user1592147, Mar 07 '18 at 19:59
min_revenue_row =min(csvReader, key=lambda row: int(row[1])) ValueError: min() arg is an empty sequence...getting this error — user1592147, Mar 07 '18 at 20:07
@user1592147 Yes, there is a `min` function that works just like `max`. — abarnert, Mar 07 '18 at 22:31
@user1592147 Meanwhile, both `min` and `max` will raise an exception if called on an empty sequence. Without seeing your full code, my guess is that you don't really have an empty CSV file; what's happening is that you're using the same `csvReader` twice in a row. I'll edit my answer to explain more. — abarnert, Mar 07 '18 at 22:32

score 0 · Answer 2 · answered Mar 07 '18 at 18:53

0

This should work. Since the elements of your array are string, you need to convert them to int using map(int,a) first.

a=['1154293', '885773', '-448704', '563679', '555394', '631974', '957395', '1104047', '693464', '454932', '727272', '125016', '339251', '78523', '977084', '1158718', '332681', '-341227', '173826', '742611', '1189806', '607363', '-1172384', '587993', '295198', '-300390', '468995', '698452', '967828', '-454873', '375723', '1140526', '83836', '413189', '551363', '1195111', '657081', '66659', '803301', '-953301', '883934']
print(max(map(int, a)))

answered Mar 07 '18 at 18:53

Abhisek Roy

582
12
31

Thanks , How can i find the name which is in col 0 , which has has the max revenue? – user1592147 Mar 07 '18 at 19:14
Use `.index` to get the index of the highest element and print the same index for the other column. – Abhisek Roy Mar 07 '18 at 19:16
1

That's a bad idea. It means re-searching (in an exhaustive linear search) to find the same row you already found, so you're doubling the work, both conceptually and as far as performance. – abarnert Mar 07 '18 at 19:17

score 0 · Answer 3 · answered Mar 07 '18 at 18:53

0

I think the problem is with the data type. As your numbers are with '', they are interpreted as strings and thus give the maximum value considering that.

You may want to cast each string to an integer. Like this:

new_list = [int(number) for number in old_list]

Hope this helps.

answered Mar 07 '18 at 18:53

kibs

26
3

Can be done much more sensibly using map. No need to iterate. – Abhisek Roy Mar 07 '18 at 18:54
@AbhisekRoy what do you think [map](https://stackoverflow.com/a/10973817/5858851) does? – pault Mar 07 '18 at 18:57
My bad. I thought map functions are faster. I did some research and found out that they take almost the same time. @pault – Abhisek Roy Mar 07 '18 at 18:59
@AbhisekRoy Performance between map and comprehensions is rarely the important question—except for the question of whether you need a list (that you can iterate over and over) or an iterator (which you can only use once, but doesn't waste time and space building the whole list). If you want the latter, either `map` or a generator expression is fine. If you want the former, use a list comprehension. We don't know which one the user wants. – abarnert Mar 07 '18 at 19:08
Hi , How do i find the name, which is in col[1] corresponding to max i found in col[2] – user1592147 Mar 07 '18 at 19:09
@AbhisekRoy The other question is which one is more readable. In this case, because the expression is just calling a function on each element, `map` probably is actually more readable, at least if we want an iterator rather than a list. But I still wouldn't call it "much more sensibly", just "slightly more readably". – abarnert Mar 07 '18 at 19:09
@user1592147 have you tried using Pandas for reading and analyzing your csv? I think you could find working with dataframes useful. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html – kibs Mar 07 '18 at 19:14

score 0 · Answer 4 · answered Mar 07 '18 at 18:55

0

Thank you all

I converted to int

Revenue.append(int(row[1]))

Now it works fine.

Thanks gain

answered Mar 07 '18 at 18:55

user1592147

11
3
8

1

I would caution this, it appears you still don't understand what is happening. – pstatix Mar 07 '18 at 18:56
1

Its important that you understand whats was the thing you were doing wrong. And I suggest you to accept someones answer where you find whats going wrong and try not to add your own answer. – Reck Mar 07 '18 at 19:02
Please let me know, what i am doing worng, am very new to python,i got the output by making change like above – user1592147 Mar 07 '18 at 19:07

find max of a column in a csv file using python

4 Answers4