0

I am parsing through rows and cells in excel files and write to a csv file. I ran into a UnicodeEncodeError: 'charmap' codec can't encode.... I fixed this by adding .encode("utf-8") after the variable. But this output a b'.....' to the string. What if I don't want this encoding. I dont want these special characters.

How can I skip these characters or this error? Here are my code:

import csv
from os import listdir
from os.path import isfile, join
mypath = "C:/Complete_XLSX/"
file_lists = [f for f in listdir(mypath) if isfile(join(mypath, f))]
import xlrd

data_writer = csv.DictWriter(open('downloaded_data.csv', 'w', newline=''),
                             fieldnames=['fund_ticker', 'constituent_ticker', 'Name',
                                         'Detail_Holding_Type', 'Date', 'Holding'])
data_writer.writeheader()

for file in file_lists:
    book = xlrd.open_workbook(f'{mypath}{file}')
    sheet_names = book.sheet_names()
    for sheet in book.sheets():
        fund = sheet.name[0: sheet.name.find('_')]
        for row in sheet.get_rows():
            for col in range(3, sheet.ncols):
                date = xlrd.xldate_as_tuple(sheet.cell_value(0, col), book.datemode)
                date_str = date[0] * 10000 + date[1] * 100 + date[2]
                if row[col].value:
                    try:
                        data_writer.writerow({'fund_ticker': fund, 'constituent_ticker': row[0].value,
                                          'Name': row[1].value.encode("utf-8"), 'Detail_Holding_Type': row[2].value,
                                          'Date': date_str, 'Holding': row[col].value})
duckman
  • 687
  • 1
  • 15
  • 30
  • You probably need to `open(...)` the file to write to using `encoding='utf-8'`. But we'd need to see a representative code snippet to be able to say for sure. – deceze May 27 '20 at 07:06
  • @deceze I added my code, can you please open the post? – duckman May 27 '20 at 07:09
  • As I thought, add `encoding='utf-8'` to that `open(...)` call…? – deceze May 27 '20 at 07:10
  • Yes, But this woull still output the special characters to the CSV. How do I skip these characters? – duckman May 27 '20 at 07:15
  • You would then need to define what exactly "special characters" are. Anything "non-ASCII"? Non-Latin-1? If you can properly output them now, why'd you want to discard them instead? Doesn't that lose information? – deceze May 27 '20 at 07:17
  • These characters don't add value. They include, for example, the letter R with a circle around. I tried adding `encoding='utf-8' ` to the open(...) but it does not work with the `csv.DictWriter.writerow` – duckman May 27 '20 at 08:24
  • nevermind. I got it to work now. thanks – duckman May 27 '20 at 08:30

0 Answers0