I am trying to fix an issue I'm having with null bytes in a CSV files.
The csv_file
object is being passed in from a different function in my Flask application:
stream = codecs.iterdecode(csv_file.stream, "utf-8-sig", errors="strict")
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")
for row in dict_reader: # Error is thrown here
...
The error thrown in the console is _csv.Error: line contains NULL byte
.
So far, I have tried:
- different encoding types (I checked the encoding type and it is utf-8-sig)
- using
.replace('\x00', '')
but I can't seem to get these null bytes to be removed.
I would like to remove the null bytes and replace them with empty strings, but I would also be okay with skipping over the row that contains the null bytes; I am unable to share my csv file.
EDIT: The solution I reached:
content = csv_file.read()
# Converting the above object into an in-memory byte stream
csv_stream = io.BytesIO(content)
# Iterating through the lines and replacing null bytes with empty
string
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)
# Below remains unchanged, just passing in fixed_lines instead of csv_stream
stream = codecs.iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")