Why is csvreader for Python starting good then producing NULL bytes?

Question

OKay so I am reading an excel workbook. I read the file for a while and it started off a .csv after debugging and doing other things below the code i am showing you it changed to a xlsx I started getting IOError no such file or directory. I figured out why and changed FFA.csv to FFA.xlsx and it worked error free. Then I started doing other things and debugging. Got up this morning and now i get the following Error : line contains NULL byte. weird because the code started out good. Now it can't read. I put in the print repr() to debug and it infact now prints NULL bytes. So how do i fix this and prevent it in the future? here is the 1st 200 bytes:

PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00b\xee\x9dh^\x01\x00\x00\x90\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

import csv

def readFile():
    count = 0
    print repr(open("FFA.xlsx", "rb").read(200)) #dump 1st 200 bytes
    with open("FFA.xlsx","rb") as csvfile:
        FFAreader = csv.reader(csvfile, delimiter=",")
        for row in FFAreader:
            idd = row[0]
            name = row[1]
            pos = row[2]
            team = row[3]
            pts = row[4]
            oecr = row[5]
            oR = row[6]
            posR = row[7]
            up = row[8]
            low =row[9]
            risk = row[10]
            swing = row[11]
 readFile()

uh, .xlsx is not a csv file. it's a zip file containing multiple xml and other files. you're reading binary garbage. — Marc B, Aug 20 '15 at 14:18
okay well like i said i only changed it because it seem to change itself however it was working fine for a while. I changed it back to a .csv but still get the error message. However the first 200 bytes has changed to — Jonathan Holloway, Aug 20 '15 at 14:26
This is the first 200 bytes now 'playerId,playername,position,team,points,overallECR,overallRank,positionRank,upper,lower,risk,swing\r\n100029,49ers,DST,SF,104.896,245,979,16,165.7802,92.0755,4.0319,73.7047\r\n2543501,Aaron Colvin,DB,JAC' — Jonathan Holloway, Aug 20 '15 at 14:26
As an additional suggestion, you should get the call to readFile() method after the usual block for main: ` if __name__ == '__main__': readFile() ` — Alberto, Aug 20 '15 at 14:50
As pointed out by mydaemon and others, you really should use the right tool for the job. xlrd does a great job in this sort of scenario. — John T, Aug 21 '15 at 00:54

Alberto · Answer 1 · 2015-08-20T15:23:06.343

1

The code you have posted have a small but dangerous mistake, since you are leaking the file handle by opening it twice.

1) You are opening the file and reading 200 bytes from it, but not closing it. 2) You are then opening the file the proper way, via a context manager, which in fact could read anything from it.

Some questions that may help you to debug the problem:

Is the file you are opening stored in a network'd resource? (CIFS, NFS, etc)
Have you checked the file is not opened by another process? lsof can help you to check that.
Is this running on windows or Linux? Can you test in under linux, if it happens in windows, and viceversa?

I forgot to mention that you should not use CSV for anything related to Excel, even when the file seems to be a CSV data-wise. Use XLRD module (https://pypi.python.org/pypi/xlrd) , it's cross-platform and opens and reads perfectly fine both XSL and XSLX files since version 0.8.

This little piece of code will show you how to open the workbook and parse it in a basic manner:

import xlrd  

def open_excel():  
  with xlrd.open_workbook('FFA.xlsx') as wb:  
      sh = wb.sheet_by_name('Sheet1')  
      for rownum in xrange(sh.nrows):  
        [Do whatever you need here]

edited Aug 20 '15 at 15:23

answered Aug 20 '15 at 14:53

Alberto

687
6
21

Well I am doing in Windows, and the major reason why I open it the first time was for debugging purposes to see the null bytes thats not in the normal code. I also didn't experience this problem originally. No I can't test in Linux. And it is important to point out after chaning to a .csv file the bytes changed to 'playerId,playername,position,team,points,overallECR,overallRank,positionRank,up‌per,lower,risk,swing\r\n100029,49ers,DST,SF,104.896,245,979,16,165.7802,92.0755,4‌.0319,73.7047\r\n2543501,Aaron Colvin,DB,JAC' this means \r\n is produced in a cell with no data in it at all. – Jonathan Holloway Aug 20 '15 at 14:57
Please see my comment since I edited my answer; try with xlrd, it should work straight away. – Alberto Aug 20 '15 at 14:59
so what must be typed to open the file using xlrd? – Jonathan Holloway Aug 20 '15 at 15:00
The example below only covers your question about opening the file xlrd. For further information, check the documentation (https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966) or search here in StackOverflow, there are dozens of questions and answers about using it. I've added the example in the original answer. – Alberto Aug 20 '15 at 15:14
Okay so it prints the row numbers how do i access the values from rows? – Jonathan Holloway Aug 20 '15 at 15:42

score 0 · Answer 2 · answered Aug 20 '15 at 14:36

I agree with Marc, I did a training exercise importing an excel file and I think pandas library would help in that case where you can import pandas as pd and use pd.read_excel(file_name) as part of a data_processing function like read_file() post import.

score 0 · Answer 3 · answered Aug 20 '15 at 15:12

So this is what I did. But I am intersted in learning the xlrd method i have the module but no documentation. This works no error messages. Still not sure why it changed from .csv to xlsx but its working now. What is the script like in xlrd?

import csv
def readFile():
    count = 0
    #print repr(open("FFA.csv", "rb").read(200)) #dump 1st 200 bytes check if null values produced.
    with open("FFA.csv","rb") as csvfile:
        FFAreader = csv.reader(csvfile, delimiter=",")
        for row in FFAreader:
            idd = row[0]
            name = row[1]
            pos = row[2]
            team = row[3]
            pts = row[4]
            oecr = row[5]
            oR = row[6]
            posR = row[7]
            up = row[8]
            low =row[9]
            risk = row[10]
            swing = row[11]
readFile()

Please check my original answer, I've added the example on how to open the file and perform the minimal parsing. If you are interested on parsing it as csv, you can convert from xslx to CSV using xlrd, and then parse it with CSV. See the answer to this question: http://stackoverflow.com/questions/17314062/how-to-convert-xml-to-csv-using-xsl — Alberto, Aug 20 '15 at 15:25
Sorry, the link above is wrong, this is the right one: http://stackoverflow.com/questions/9884353/xls-to-csv-convertor — Alberto, Aug 20 '15 at 15:32

Why is csvreader for Python starting good then producing NULL bytes?

3 Answers3