0

I have a CSV file with 20501 rows and 26 columns. I want to select 5 column and 9 columns data. Here is what i have

import csv 
filename = 'feed_data.csv'
f = open(filename)
readCSV = csv.reader(f, delimiter=',')
names = []
confidence_score = []
for row in readCSV:
    names.append(row[8])
    confidence_score.append(row[4])

here is the error

Traceback (most recent call last):
File "C:/Users/raady/PycharmProjects/feeder_Classification/test.py", line 10, in <module>
for row in readCSV:
File "C:\Users\raady\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1009: character maps to <undefined>

how to rectify the error? I don't want to use pandas.

Is there any way that both columns can be copied only to one variable, instead of names and confidence_score seperately?

Edit: I have installed python 3.6 and pycharm environment. I have installed all the packages from the pycharm environment.

Edit 2: I have tried in the suggested link by modifying f=open(filename,encoding='utf8'), but I still have the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 934: invalid start byte. The CSV file has been encoded in utf8.

Edit 3: I slightly modified code like this

import csv
filename = 'feed_data.csv'
# filename = 'test.csv'

with open(filename) as csvfile:
   readCSV = csv.reader(csvfile, delimiter=',')
   data2 = []
   for row in readCSV:
       data = []
       data.append(row[14]) # appending names
       data.append(row[5])  # appending confidence
       data2.append(data)

   print(data2)

I am adding the two files test.py and feed_data( directly downloaded from kaggle). When I try with test.py it is working fine and I am able to select required column data but not with feed_data.py and it gives the error mentioned above.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Raady
  • 1,686
  • 5
  • 22
  • 46
  • Do you know encoding type of file in question? – Rao Sahab Mar 15 '18 at 08:40
  • I have mentioned utf8 as encoding type and it gives me this error, UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 934: invalid start byte – Raady Mar 15 '18 at 08:43
  • Possible duplicate of [UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to ](https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character) – David Leon Mar 15 '18 at 08:45
  • I have tried mentioning encoding f=open(filename,encoding = 'utf8') then i get the error as mentioned as comment – Raady Mar 15 '18 at 08:52
  • I am using python 3.6 , will this information help ? – Raady Mar 15 '18 at 08:54

1 Answers1

0

Answer moved from a question edit:

A little modification helped

with open(filename, encoding='utf8', errors='ignore') as csvfile:

The issue is with the database file, The information regarding the actual encoding technique is missing. Tried with the available encoding formats by checking with help of visual studio code. Some row data are corrupted and are ignored with the above command.

TylerH
  • 20,799
  • 66
  • 75
  • 101