edit: this questions Convert UTF-8 with BOM to UTF-8 with no BOM in Python which only works on txt files, does not solve my issue with csv files
I have two csv files
rtc_csv_file="csv_migration\\rtc-test.csv"
ads_csv_file="csv_migration\\ads-test.csv"
here is the ads-test.csv file (which is causing issues)
https://easyupload.io/bk1krp
the file is UTF-8 with BOM
is what vscode bottom right corner says when i open the csv.
and I am trying to write a python function to read in every row, and convert it to a dict object.
my function works for the first file rtc-test.csv
just fine, but for the second file ads-test.csv
I get an error UTF-16 stream does not start with BOM
when i use utf-16
. so ive tried to use utf-8
and utf-8-sig
but it only reads in each line as a string with commas separating values. I cant split by comma because I will have column values which include commas.
my python code correctly reads in rtc-test.csv as a list of values. How can I read in ads-test.csv as a list of values when the csv is encoded using utf-8 with bom?
code:
rtc_csv_file="csv_migration\\rtc-test.csv"
ads_csv_file="csv_migration\\ads-test.csv"
from csv import reader
import csv
# read in csv, convert to map organized by 'id' as index root parent value
def read_csv_as_map(csv_filename, id_format, encodingVar):
print('filename: '+csv_filename+', id_format: '+id_format+', encoding: '+encodingVar)
dict={}
dict['rows']={}
try:
with open(csv_filename, 'r', encoding=encodingVar) as read_obj:
csv_reader = reader(read_obj, delimiter='\t')
csv_cols = None
for row in csv_reader:
if csv_cols is None:
csv_cols = row
dict['csv_cols']=csv_cols
print('csv_cols=',csv_cols)
else:
row_id_val = row[csv_cols.index(str(id_format))]
print('row_id_val=',row_id_val)
dict['rows'][row_id_val] = row
print('done')
return dict
except Exception as e:
print('err=',e)
return {}
rtc_dict = read_csv_as_map(rtc_csv_file, 'Id', 'utf-16')
ads_dict = read_csv_as_map(ads_csv_file, 'ID', 'utf-16')
console output:
filename: csv_migration\rtc-test.csv, id_format: Id, encoding: utf-16
csv_cols= ['Summary', 'Status', 'Type', 'Id', '12NC']
row_id_val= 262998
done
filename: csv_migration\ads-test.csv, id_format: ID, encoding: utf-16
err= UTF-16 stream does not start with BOM
if i try to use utf-16-le
instead, i get a different error 'utf-16-le' codec can't decode byte 0x22 in position 0: truncated data
if i try to use utf-16-be
, i get this error: 'utf-16-be' codec can't decode byte 0x22 in position 0: truncated data
why cant my python code read this csv file?