1

I am working with a .csv file, I wrote this code to count number of times each value in the year column occurs in a csv dataset.

I keep getting a IndexError: list out of range in line 10 at row_year = suspension[5] whenever I run the code on my personal system but the code runs fine when I run it on dataquest site.

The csv dataset has 7 columns, the 5th column represents years.

import csv

file  = open("nfl_suspensions_data.csv")
nfl_suspensions = list(csv.reader(file))
nfl_suspensions = nfl_suspensions[1:]

years = {}

for suspension in nfl_suspensions:
    row_year = suspension[5]
    if row_year in years:
        years[row_year] = years[row_year] + 1
    else:
        years[row_year] = 1

print(years)
Greenonline
  • 1,330
  • 8
  • 23
  • 31

1 Answers1

0

You have data that is too short - you index into the list behind it. If you really have year as 5th column you should use column[4] to access it - python indexes are 0 based.

Use error handling:

import csv

file  = open("nfl_suspensions_data.csv")
nfl_suspensions = list(csv.reader(file))
nfl_suspensions = nfl_suspensions[1:]

years = {}
 for line_nr, suspension in enumerate(nfl_suspensions):
    try:
        row_year = suspension[5]
    except IndexError:
        # 0 based line_nr, line_nr + 1 due to removed header line  
        print("Data corrupt: less then 6 entries. Line:", line_nr+1)
        print(suspension)

        # skip this data
        continue
    if row_year in years:
        years[row_year] = years[row_year] + 1
    else:
        years[row_year] = 1

print(years)

This follows pythons Ask forgiveness not permission philosophy.


You should also switch to

with open("nfl_suspensions_data.csv") as file:
    nfl_suspensions = list(csv.reader(file))[1:]

which is the prefered wa of reading files. See python.org - reading and writing files (see 2nd code example block)

You could leverage collections.defaultdict as well:

years = defaultdict(int) # above 

and remove the if checks around

# if row_year in years:
    years[row_year] += 1  # this should work using a defaultdict(int)
# else:
#    years[row_year] = 1

or use collections.Counter


Shorter code including file generation (with year at row[5] == 6th column) that accomplishes your task:

import csv
from collections import Counter


# Create a demo data file with errors:    
with open("nfl_suspensions_data.csv","w") as f:
    for inter in range(1,10):
        for y in range(1980,2001,inter):
            f.write(f"na,na,na,na,na,{y},na,na\n")
        # corrupt line
        f.write(f"na,na,na,na\n")


# process and count the years:
with open("nfl_suspensions_data.csv") as file:
    nfl_suspensions = list(csv.reader(file))[1:]

as_columns = list(zip(*[l for l in nfl_suspensions if len(l) > 6]))
print(Counter(as_columns[5]))

Output:

Counter({'1980': 8, '1992': 5, '1998': 5, '1986': 4, '1988': 4, '1996': 4,
         '2000': 4, '1984': 3, '1989': 3, '1990': 3, '1994': 3, '1995': 3, 
         '1982': 2, '1983': 2, '1985': 2, '1987': 2, '1981': 1, '1991': 1, 
         '1993': 1, '1997': 1, '1999': 1})

Your logic fixed, applied to the data generated above:

def your_code_fixed(sus):
    years = {}
    for line_nr, suspension in enumerate(sus):
        try:
            row_year = suspension[5]
        except IndexError:
            # 0 based line_nr, line_nr + 1 due to removed header line  
            print("Data corrupt: less then 6 entries. Line:", line_nr+1)
            print(suspension)

            # skip this data
            continue
        if row_year in years:
            years[row_year] = years[row_year] + 1
        else:
            years[row_year] = 1
    print(years)    

with open("nfl_suspensions_data.csv") as file:
    nfl_suspensions = list(csv.reader(file))[1:]

your_code_fixed(nfl_suspensions)

Output with above data file:

Data corrupt: less then 6 entries. Line: 21
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 33
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 41
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 48
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 54
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 59
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 63
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 67
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 71
['na', 'na', 'na', 'na']

{'1981': 1, '1982': 2, '1983': 2, '1984': 3, '1985': 2, '1986': 4, '1987': 2,
 '1988': 4, '1989': 3, '1990': 3, '1991': 1, '1992': 5, '1993': 1, '1994': 3, 
 '1995': 3, '1996': 4, '1997': 1, '1998': 5, '1999': 1, '2000': 4, '1980': 8}
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • This is useful!, but it still baffles me that it works on a tutorial site but spits IndexError on my computer – Akano Benjamin Nov 03 '18 at 11:41
  • @AkanoBenjamin see the working example for a file with errors in it. You demo file probably has a trailing newline or somthing like that - check it. The file on the site has correct data – Patrick Artner Nov 03 '18 at 11:48