-1

I cant figure out why this gives me an 'IndexError: list index out of range'. I am reading from a simple csv.file and trying to get the values out as separated by commas.

with open('project_twitter_data.csv','r') as twf:

    tw = twf.readlines()[1:] # I dont need the very first line

    for i in tw:
        linelst = i.strip().split(",")

        RT = linelst[1]
        RP = linelst[2]

        rows = "{}, {}".format(RT,RP)

my output looks like this


print(tw) # the original strings.
..\nBORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6\nREASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0\nHOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0\n\n"

print (i)
..
BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6

REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0

HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0

print(linelst)
..
['BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.', '4', '6']
["REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love", '19', '0']
["HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a", '0', '0']
['']

print(rows) 
..
4, 6
19, 0
0, 0


# the error
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-f27e87689f41> in <module>
     6         linelst = i.strip().split(",")
     7 #        print(linelst)
----> 8         RT = linelst[1]
     9         RP = linelst[2]
   

IndexError: list index out of range

what am I doing wrong?

I have also noticed that an empty list appeared at the very end of my lists, [' '] after I used strip().split(","). I can delete it with twf.readlines()[1:][:-1] yet the error still persists.. thank you for any advice.

Bluetail
  • 1,093
  • 2
  • 13
  • 27

1 Answers1

1

Your final line, after stripping, is empty, so split produces a list of just the empty string.

Simplest solution is to explicitly skip empty lines:

with open('project_twitter_data.csv','r') as twf:

    next(twf, None)  # Advance past first line without needing to slurp whole file into memory and
                     # slice it, tying peak memory usage to max line size, not size of file

    for line in twf:
        line = line.strip()
        if not line:
            continue
        linelst = line.split(",")

        # If non-empty, but incomplete lines should be ignored:
        if len(linelst) < 3:
            continue

        RT = linelst[1]
        RP = linelst[2]

        rows = "{}, {}".format(RT,RP)

Or simpler, using EAFP patterns and the csv module, which you should always be using when dealing with CSV files (the format is a lot more complex than just "split on commas"):

import csv

with open('project_twitter_data.csv', 'r', newline='') as twf:  # newline='' needed for proper CSV dialect handling
    csvf = csv.reader(twf)
    next(csvf, None)  # Advance past first row without needing to slurp whole file into memory and
                      # slice it, tying peak memory usage to max line size, not size of file

    for row in csvf:
        try:
            RT, RP = row[1:3]
        except ValueError:
            continue  # Didn't have enough elements, incomplete line
 
        rows = "{}, {}".format(RT,RP)

Note: In both cases, I made some minor improvements to avoid large temporary lists, and tweaked some minor things to improve readability (naming a str variable i is bad form; i is generally used for indices, or at least integers, and you had a clearer name readily available, so even a placeholder like x would be inappropriate).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • yep! magically, this error is gone! was it the empty line that caused it? – Bluetail Jan 29 '21 at 15:55
  • 1
    @bluetail: Yeah. The empty line converted to a length one `list` (only valid index was `0`). As soon as you tried to index at index `1`, it died with an `IndexError` (slicing wouldn't die, but the resulting slice might be smaller than anticipated, leading to a `ValueError` when you try to unpack it to more targets than it has elements), which the second example I gave relies on to simplify the code). – ShadowRanger Jan 29 '21 at 16:00
  • yeah, I got it. if len(linelst) < 3: print (linelst) indeed prints out [' '] – Bluetail Jan 29 '21 at 16:12
  • do you have a link to the documentation on next() where it mentions the parameters? I have tried to look up next( ,None) and could only find this. https://docs.python.org/3/library/functions.html – Bluetail Jan 29 '21 at 16:42
  • @bluetail: That link has the info ([direct section link](https://docs.python.org/3/library/functions.html#next)). The first parameter is an iterator (file-like objects and `csv.reader` are iterators of their lines/rows). The second parameter is a default value to return if the iterator is exhausted. `next(someiterator, None)` is a cheap way to throw away a single value from an iterator if there is a value available, and do nothing otherwise. It matches your original code by discarding the first line/row from the file, without risking an exception if the file is empty (has no lines at all). – ShadowRanger Jan 29 '21 at 18:35