0

I have a short Python script for reading CSV files. So far nothing special.

def csvReader(filename):

    return csv.reader(open(filename,'tr', encoding='utf-8'))
  
for row in csvReader('test.csv'):
    print(row)

However, it triggers an error message if the CSV file contains the word "NUL" in a field. E.g. like this:

timestamp_utc,id, text,key

2021-07-15 13:47:01,12345,"Some text,sfghj",z3O9ZNULdxBfR

If I now read in this CSV file (which, by the way, is not created by me, it is delivered externally), I get this error message:

_csv.Error: line contains NUL

Curiously, if I open the file in the simple Windows editor and save it without changing anything, the error does not occur and the file is processed normally.

General conditions: Windows, Pycharm, Python 3.8

Mephisto
  • 11
  • 1
  • 1
    If you have no control over the origin of the CSV file, you may need to preprocess it. Open it in binary mode ('rb'), read entire contents into memory and search for the NUL character. I don't think NUL is a string - I think it's binary zero. You can then either remove it or replace it with something else –  Aug 04 '21 at 07:56
  • Can you please provide a [mcve]? A short sample csv should be sufficient. Are you absolutely sure the line contains the *string* ``NUL`` and not the [*symbol* ``NUL``](https://en.wikipedia.org/wiki/Null_character) aka ``\x00``? You can look at the ``repr`` of a line to get an unambiguous representation. – MisterMiyagi Aug 04 '21 at 08:02
  • 1
    @Paul: Yes , dat contains the CSV file have changed it in the question. But the file is found and read correctly. (Until the error). I get the error, as described, also only with the original file. If I create a new one with the same content or open the original one in the window editor and save it, it is processed normally. That is my problem. But I have to process the original. – Mephisto Aug 04 '21 at 08:04
  • @MisterMiyagi I have tried that. The problem is that I can't post the original file and as soon as I take a part out and save it in a new file, the problem doesn't occur anymore. But I am sure that simply in a string the word NUL appears. This problem appears in 90 CSV files also only in 2 cases where the "key" value by random contains the character string "NUL" or "NULL". – Mephisto Aug 04 '21 at 08:13
  • 1
    That error indicates that [the file contains a NUL character](https://github.com/python/cpython/blob/ac811f9b5a68ce8756911ef2c8be83b46696018f/Modules/_csv.c#L878), i.e. a byte which is zero, not that it contains the text "NUL". (As several others have guessed ; I'm just writing this to add the link to the source code.) – Ture Pålsson Aug 04 '21 at 08:14
  • @TurePålsson Thanks for the clarification. But why is the string NUL interpreted as zero-byte and why does this change after simply opening and saving in the Windows editor. If I save the file in Python line by line in a new file with utf-8 coding unfortunately nothing changes and the same error occurs. – Mephisto Aug 04 '21 at 08:50
  • The *string* ``NUL`` is not interpreted as zero-byte, but zero-byte is often displayed as ``NUL``. Many text editors will just remove or normalise invisible symbols, just like adjusting newline indicators to the system standard. – MisterMiyagi Aug 04 '21 at 08:52
  • @MisterMiyagi : Sorry for my repeated request, but I have already tried many solutions with zero bytes without success. But in my case, it is really part of the string. I have many files that all have a generated key. This key contains 44 characters. The error appears only if by chance this key has the sequence of characters NUL. All other files are processed without errors. I do not believe that only by chance these files contain an invisible symbol. – Mephisto Aug 04 '21 at 09:07
  • I suspect that it must have something to do with the default formatting of the original file (which I can't change), because it disappeared after opening and saving in the editor. Unfortunately I have no approach to check this. Are there any ideas in this direction? – Mephisto Aug 04 '21 at 09:11
  • 1
    As mentioned before, please provide a [mcve] if you feel this does not match what people have described here. I can only reproduce the issue when there is a literal NUL *byte* in the content. Note that NUL bytes may occur due to other reasons, e.g. using the wrong encoding – which is also something an editor may recover. – MisterMiyagi Aug 04 '21 at 09:13
  • OK, I have solved the problem. Against my assumptions, there was a wrong correlation. Exactly the files with the NUL substring in the text had also NUL bytes in the coding. Why is not clear to me but the error was actually there. Thanks to all who have persistently pointed out that my assumption is wrong ;-D – Mephisto Aug 04 '21 at 10:07

0 Answers0