import sys
dataset = open('file-00.csv','r')
dataset_l = dataset.readlines()
When opening the above file, I get the following error:
**UnicodeDecodeError: 'utf-8' codec cant decode byte 0xfe in position 156: invalide start byte**
So I changed code to below
import sys
dataset = open('file-00.csv','r', errors='replace')
dataset_l = dataset.readlines()
I also tried errors='ignore' but for both the initial error now dissapears but later in my code i get another error:
def find_class_1(row):
global file_l_sp
for line in file_l_sp:
if line[0] == row[2] and line[1] == row[4] and line[2] == row[5]:
return line[3].strip()
return 'other'
File "Label_Classify_Dataset.py", line 56, in
dataset_w_label += dataset_l[it].strip() + ',' + find_class_1(l) + ',' + find_class_2(l) + '\n'
File "Label_Classify_Dataset.py", line 40, in find_class_1
if line[0] == row[2] and line[1] == row[4] and line[2] == row[5]:strong text
IndexError: list index out of range
How can I either fix the first or the second error ?
UPDATE....
I have used readline to enumerate and print each line, and have managed to work out which line is causing the error. It is indeed some random character but tshark must have substituted. Deleting this removes the error, but obviously I would rather skip over the lines rather than delete them
with open('file.csv') as f:
for i, line in enumerate(f):
print('{} = {}'.format(i+1, line.strip()))
Im sure there is a better way to do enumerate lol