0

I have a file that's poorly formatted, if I try to open it with simply open('data.csv', 'r') I get :

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in 
    position 4681: character maps to <undefined>

So I used open('data.csv', 'rb') instead and it works fine. Then I tried to transfer the needed information to a new file without success using:

with open('datacsv', 'rb') as file, open('new.csv', 'w') as newf:
    for f in file:
        newf.write(str(f.split(',')[0:5]))

If I take off the split() it writes the data to the new file fine, but if I add the split which I'm using to extract first few columns I get:

TypeError: 'str' does not support the buffer interface

I tried the suggestions in here TypeError: 'str' does not support the buffer interface but none of them help.

How else can I prevent the TypeError from rising?

Community
  • 1
  • 1
Leb
  • 15,483
  • 10
  • 56
  • 75
  • What version of python are you using? Can you try python3, with r, not rb. – pvg Jul 12 '15 at 17:39
  • It is python3, if I do without 'rb' the file won't open at all. – Leb Jul 12 '15 at 17:42
  • 1
    That is your problem, not the type error. You need to figure out what the encoding of the file is and then pass the encoding on open, like so `open(filename, encoding="utf8")`. changing to rb simply masks your root problem. – pvg Jul 12 '15 at 17:47

1 Answers1

0

Your file is failing to decode with the default encoding. You should find out how the file is encoded and then pass that encoding as a named parameter to open. An easy way to check the encoding is to open the file in a decent text editor such as Notepad++, Sublime, BBEdit, etc. The editor will make a reasonable effort to detect the encoding.

pvg
  • 2,673
  • 4
  • 17
  • 31
  • I thought I did try `open('datacsv', 'r',encoding='utf8')` but I guess I overlooked it. That fixed the problem. – Leb Jul 13 '15 at 02:51