1

A few weeks ago I wrote a CSV parser in python and it was working great with the provided text file. But when we tried to test is with other files the problems started.

First was the

ValueError: empty string for float()

for a string like "313.44". The problem was that in unicode there was some empty bytes betwee the numbers '\x0'.

Ok I decoded to read it as an unicode with

codecs.open(filename, 'r', 'utf-16')

And then the hell opened, missing BOM, problems with the line end characters (LF vs CR+LF) etc.

So can you provide me or give me hint for a workaround about parsing unicode and non-unicode files if I do not know what the encoding is, is BOM present, what line ending are etc.

P.S. I am using Python 2.7

Ilian Iliev
  • 3,217
  • 4
  • 26
  • 51

2 Answers2

1

The problem was solved using the csv module as proposed by Daenyth

Ilian Iliev
  • 3,217
  • 4
  • 26
  • 51
0

It mainly depends on the Python version you are using but those 2 links shopuld help you out:

Community
  • 1
  • 1
Moss
  • 6,002
  • 1
  • 35
  • 40