Python: how to ignore special characters when parsing through strings

Question

I am parsing through rows and cells in excel files and write to a csv file. I ran into a UnicodeEncodeError: 'charmap' codec can't encode.... I fixed this by adding .encode("utf-8") after the variable. But this output a b'.....' to the string. What if I don't want this encoding. I dont want these special characters.

How can I skip these characters or this error? Here are my code:

import csv
from os import listdir
from os.path import isfile, join
mypath = "C:/Complete_XLSX/"
file_lists = [f for f in listdir(mypath) if isfile(join(mypath, f))]
import xlrd

data_writer = csv.DictWriter(open('downloaded_data.csv', 'w', newline=''),
                             fieldnames=['fund_ticker', 'constituent_ticker', 'Name',
                                         'Detail_Holding_Type', 'Date', 'Holding'])
data_writer.writeheader()

for file in file_lists:
    book = xlrd.open_workbook(f'{mypath}{file}')
    sheet_names = book.sheet_names()
    for sheet in book.sheets():
        fund = sheet.name[0: sheet.name.find('_')]
        for row in sheet.get_rows():
            for col in range(3, sheet.ncols):
                date = xlrd.xldate_as_tuple(sheet.cell_value(0, col), book.datemode)
                date_str = date[0] * 10000 + date[1] * 100 + date[2]
                if row[col].value:
                    try:
                        data_writer.writerow({'fund_ticker': fund, 'constituent_ticker': row[0].value,
                                          'Name': row[1].value.encode("utf-8"), 'Detail_Holding_Type': row[2].value,
                                          'Date': date_str, 'Holding': row[col].value})

You probably need to `open(...)` the file to write to using `encoding='utf-8'`. But we'd need to see a representative code snippet to be able to say for sure. — deceze, May 27 '20 at 07:06
As I thought, add `encoding='utf-8'` to that `open(...)` call…? — deceze, May 27 '20 at 07:10
Yes, But this woull still output the special characters to the CSV. How do I skip these characters? — duckman, May 27 '20 at 07:15
You would then need to define what exactly "special characters" are. Anything "non-ASCII"? Non-Latin-1? If you can properly output them now, why'd you want to discard them instead? Doesn't that lose information? — deceze, May 27 '20 at 07:17
These characters don't add value. They include, for example, the letter R with a circle around. I tried adding `encoding='utf-8' ` to the open(...) but it does not work with the `csv.DictWriter.writerow` — duckman, May 27 '20 at 08:24

Python: how to ignore special characters when parsing through strings

0 Answers0