Getting error when using split()

Question

I have a file that's poorly formatted, if I try to open it with simply open('data.csv', 'r') I get :

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in 
    position 4681: character maps to <undefined>

So I used open('data.csv', 'rb') instead and it works fine. Then I tried to transfer the needed information to a new file without success using:

with open('datacsv', 'rb') as file, open('new.csv', 'w') as newf:
    for f in file:
        newf.write(str(f.split(',')[0:5]))

If I take off the split() it writes the data to the new file fine, but if I add the split which I'm using to extract first few columns I get:

TypeError: 'str' does not support the buffer interface

I tried the suggestions in here TypeError: 'str' does not support the buffer interface but none of them help.

How else can I prevent the TypeError from rising?

What version of python are you using? Can you try python3, with r, not rb. — pvg, Jul 12 '15 at 17:39
It is python3, if I do without 'rb' the file won't open at all. — Leb, Jul 12 '15 at 17:42
That is your problem, not the type error. You need to figure out what the encoding of the file is and then pass the encoding on open, like so `open(filename, encoding="utf8")`. changing to rb simply masks your root problem. — pvg, Jul 12 '15 at 17:47

score 0 · Accepted Answer · answered Jul 12 '15 at 17:50

0

Your file is failing to decode with the default encoding. You should find out how the file is encoded and then pass that encoding as a named parameter to open. An easy way to check the encoding is to open the file in a decent text editor such as Notepad++, Sublime, BBEdit, etc. The editor will make a reasonable effort to detect the encoding.

answered Jul 12 '15 at 17:50

pvg

2,673
4
17
31

I thought I did try `open('datacsv', 'r',encoding='utf8')` but I guess I overlooked it. That fixed the problem. – Leb Jul 13 '15 at 02:51

Getting error when using split()

1 Answers1