What's wrong with this python program working on .csv?

Question

I have a text file with a list of strings.

I want to search a .csv file for rows that begin with those strings and put them in a new .csv file.

In this instance, the text file is called 'output.txt', the original .csv is 'input.csv' and the new .csv file is 'corrected.csv'.

The code:

import csv

file = open('output.txt')
while 1:
    line = file.readline()
    writer = csv.writer(open('corrected.csv','wb'), dialect = 'excel')
    for row in csv.reader('input.csv'):
        if not row[0].startswith(line):
            writer.writerow(row)
    writer.close()
    if not line:
        break
    pass

The error:

Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 9, in <module>
writer.writerow(row)
TypeError: 'str' does not support the buffer interface`

New error:

Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
for row in reader:
_csv.Error: line contains NULL byte

Problem was that the CSV file was saved with tabs instead of commas, new issue now is the following:

Traceback (most recent call last):
  File "C:\Python32\Sample Program\csvParser.py", line 13, in <module>
    if row[0] not in lines:
IndexError: list index out of range

The CSV file has 500+ entries of data... does this make a difference?

`row[0]` can never `startwith(line)` because `line` will always have a newline character and `row[0]` will never have one. — Steven Rumbalski, Oct 21 '11 at 18:51
[Maybe this one post is related? : TypeError: 'str' does not support the buffer interface](http://stackoverflow.com/questions/5471158/typeerror-str-does-not-support-the-buffer-interface) — Grzegorz Wierzowiecki, Jan 14 '12 at 22:11

Blender · Accepted Answer · 2011-10-21T19:08:52.923

6

If you look at the documentation, this is how the reader is initialized:

spamReader = csv.reader(open('eggs.csv', 'r'), ...

Notice the open('eggs.csv, 'rb'). You aren't passing a file handle in line 9, so the str is being treated as a file handle and is throwing you the error.

Replace line 9 with this:

csv.reader(open('input.csv', 'r', newline = ''))

edited Oct 21 '11 at 19:08

answered Oct 21 '11 at 18:27

Blender

289,723
53
439
496

1

OP is using Python 3.2, which does not have the binary mode requirement. The docs say to open the file thusly: `open('input.csv', 'r', newline='')`. See docs.python.org/py3k/library/csv.html. – Steven Rumbalski Oct 21 '11 at 19:07
Good point. Maybe the OP will stumble upon your comment, but for now, I'll just edit it into the code. – Blender Oct 21 '11 at 19:08

Spencer Rathbun · Answer 2 · 2011-10-21T20:36:59.933

2

The csv.reader can't open a file, it takes a file object. A better solution would be this:

import csv

lines = []
with open('output.txt', 'r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

edited Oct 21 '11 at 20:36

answered Oct 21 '11 at 18:27

Spencer Rathbun

14,510
6
54
73

`Traceback (most recent call last): File "C:\Python32\Sample Program\csvParser.py", line 12, in for row in reader: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)` – James Roseman Oct 21 '11 at 18:32
@JamesRoseman erm, no. The open statement has 'rb' in it for read binary mode. I'm beginning to suspect your data file is corrupt in some way. As noted by Blender, the csv lib uses binary file handles. – Spencer Rathbun Oct 21 '11 at 18:43
1

`for line in f.readlines(): lines.append(line)` is a wordy way to say `lines = f.readlines()`. – Steven Rumbalski Oct 21 '11 at 18:55
@Spencer Rathbun That was the error message I got, it wasn't my criticism. I really appreciate the help on an issue I have little to no experience with, so thank you. I'm positive the data isn't corrupted, but might it be the way that it's formatted that would cause this type of error? – James Roseman Oct 21 '11 at 18:55
Also, `row[0] not in lines` will always evaluate to true because each item in `lines` will end in a newline character, whereas `row[0]` never will. – Steven Rumbalski Oct 21 '11 at 18:56
1

@Spencer Rathbun: OP is using Python 3.2, which does not have the binary mode requirement. The docs say to open the file thusly: `open('input.csv', 'r', newline='')`. See docs.python.org/py3k/library/csv.html. – Steven Rumbalski Oct 21 '11 at 19:06
@StevenRumbalski Ah, I use 2.7 and did not notice the version number in the path listing. And good catch on the row bit, I've fixed that. – Spencer Rathbun Oct 21 '11 at 19:27
@JamesRoseman No problem. As Steven pointed out, Python 3.2 does not have the binary requirement. If you change the `rb` and `wb` to `r` and `w` that may fix it. Also note the fix I added for the newline issue Steven noted. – Spencer Rathbun Oct 21 '11 at 19:31
You should use the term "file object" rather than "file descriptor". The former is what is return by the builtin open() function. The latter is what is returned by os.open(). – Raymond Hettinger Oct 21 '11 at 20:06
I've got a new error now, something about NULL? I'm so confused and lost on this topic... I read the documentation over and over and can't really make heads or tails of it, but I need to implement this code in the next day or so. – James Roseman Oct 25 '11 at 19:21
@JamesRoseman If you have a new error, ask a new question. If *this* question is answered, please select the correct answer, edit your question to provide more detail, or add your own answer if you have a solution. – Spencer Rathbun Oct 25 '11 at 20:39

score 0 · Answer 3 · answered Oct 25 '11 at 21:31

Your latest problem:

    if row[0] not in lines:
IndexError: list index out of range

The error message mentions a list index.
There is only one list index that it could be talking about: 0
If 0 is out of range, then len(row) must be zero.
If len(row) is zero, then the corresponding line in the input file must be empty.
If a line in the input file is empty, what do you want to do:

(a) ignore the input line altogether?
(b) raise a (fatal) error?
(c) log an error message somewher and keep going?
(d) something else?

score -2 · Answer 4 · answered Oct 21 '11 at 18:39

Try this

import csv
import cStringIO

file = open('output.txt') 
while True:     
    line = file.readline()
    buf = cStringIO.StringIO()    
    writer = csv.writer(buf, dialect = 'excel')     
    for row in csv.reader(open('input.csv')):         
        if not row[0].startswith(line):             
            writer.writerow(row)     
    writer.close()
    output = open('corrected.csv', 'wb')
    output.write(buf.getvalue())    
    if not line:         
        break            
    pass

In my experience, using a cStringIO buffer for the whole process and then dumping the entire buffer into a file is faster.

-1. cStringIO is a pointless complication. The question wasn't about his code being too slow. Premature optimization like this is a waste of time. — Steven Rumbalski, Oct 21 '11 at 19:09

What's wrong with this python program working on .csv?

4 Answers4

Linked