I have multiple gzfile in subfolders that I want to unzip in one folder. It works fine but there's a BOM signature at the beginning of each file that I would like to be removed. I have checked other questions like Removing BOM from gzip'ed CSV in Python or Convert UTF-8 with BOM to UTF-8 with no BOM in Python but it doesn't seem to work. I use Python 3.6 in Pycharm on Windows.
Here's first my code without attempt:
import gzip
import pickle
import glob
def save_object(obj, filename):
with open(filename, 'wb') as output: # Overwrites any existing file.
pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)
output_path = 'path_out'
i = 1
for filename in glob.iglob(
'path_in/**/*.gz', recursive=True):
print(filename)
with gzip.open(filename, 'rb') as f:
file_content = f.read()
new_file = output_path + "z" + str(i) + ".txt"
save_object(file_content, new_file)
f.close()
i += 1
Now, with the logic defined in Removing BOM from gzip'ed CSV in Python (at least what I understand of it) if I replace file_content = f.read()
by file_content = csv.reader(f.read().decode('utf-8-sig').encode('utf-8').splitlines())
, I get:
TypeError: can't pickle _csv.reader objects
I checked for this error (e.g. "Can't pickle <type '_csv.reader'>" error when using multiprocessing on Windows) but I found no solution I could apply.