0

I would like to find the Max length for each column in a tab delimited csv file. I can find the max value of one column by using this:

import csv
oldlen=0
with open(mfile) as csvfile:
test = csv.reader(csvfile,dialect='excel-tab')
for row in test:
    if len(row[0]) > oldlen:
        newlen = len(row[0])
print (newlen)

If I wish to do all columns (and count them), I could just change row[] manually, but I wish to learn so I tried this:

with open(mfile) as csvfile:
test = csv.reader(csvfile,dialect='excel-tab')
ncol=len(test[0])
for column in test:
    for row in test:
        if len(row[column]) > oldlen:
            newlen = len(row[0])
    print (column,newlen)

Which, of course, doesnt make programatic sense. But it indicates, I hope, what my intention is. I have to do the columns first so I can get the max length out of each column, across all rows.

jer99
  • 75
  • 1
  • 1
  • 6

2 Answers2

1

You can transpose the rows into columns with the zip() function:

with open(mfile) as csvfile:
    test = csv.reader(csvfile, dialect='excel-tab')
    columns = list(zip(*test))

and then get the maximum value per column:

for col in columns:
    print(max(col))        
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

You can use a dict to store a column number->max length lookup and assign to that by looping over each column of each row.

lengths = {}
with open(mfile) as csvfile:
    test = csv.reader(csvfile, dialect='excel-tab')
    for row in test:
        for colno, col in enumerate(row):
            lengths[colno] = max(len(col), lengths.get(colno, 0))

The number of columns will be len(lengths), and the maximum length of each will be accessible as lengths[0] for the first column lengths[1] for the second etc...

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • It works, but I'm confused over the colno,col - How does it know to go through columns ? are these reserved words? – jer99 Jul 13 '15 at 14:16
  • @jer99 think of a file as an iterable of rows... each row is itself iterable, so iterating over a row, is going through each column in that row... There's no reserved words... you can name them what you want - I opted for `colno` (column number) and `col` (column value) – Jon Clements Jul 13 '15 at 14:55
  • So the list automatically contains and index and a value. So I tried a statement like "...for rowid, row in test:" and it says there are too many values to unpack (expected 2). Thats a bit confusing to me. I'm assuming that "test" is a list with a index and value as well. – jer99 Jul 13 '15 at 17:06
  • @jer99 the [enumerate function](https://docs.python.org/3/library/functions.html#enumerate) returns a tuple of index and value... iterating over a list directly only gives the values... – Jon Clements Jul 13 '15 at 17:09