2

when I read like this, some files

list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
    FI = open(file_name, 'r', encoding='cp1252')

Error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1260: character maps to

When I switch to this

list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
    FI = open(file_name, 'r', encoding="utf-8")

Error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1459: invalid start byte

And I have read that I should open this as a binary file. But I'm not sure how to do this. Here is my function:

def readingAndAddToList():
    list_of_files = glob.glob('./*.txt') # create the list of files
    for file_name in list_of_files:
        FI = open(file_name, 'r', encoding="utf-8")
        stext = textProcessing(FI.read())# split returns a list of words delimited by sequences of whitespace (including tabs, newlines, etc, like re's \s)
        secondaryWord_list = stext.split()
        word_list.extend(secondaryWord_list) # Add words to main list
        print("Lungimea fisierului ",FI.name," este de", len(secondaryWord_list), "caractere")
        sortingAndNumberOfApparitions(secondaryWord_list)
        FI.close()

Just the beggining of my functions matter because I get the error at the reading part

Arnie97
  • 1,020
  • 7
  • 19
Adrian
  • 178
  • 1
  • 1
  • 13
  • Can you try with ```open(file_name, 'r', errors = 'ignore')```? Does it give you required output? – Hello.World Mar 19 '19 at 13:41
  • 1
    Can you share problematic file? Or identify yourself problematic symbol(s) that causes exception? – Alderven Mar 19 '19 at 13:48
  • @Hello.World it worked in this way. Because those characters where something like " ' " which I don't need at all – Adrian Mar 19 '19 at 14:11
  • 1
    @Adrian Yes I did think that was the issue. I didn't have sample for your file so told you to try! Anyways happy to help! :) – Hello.World Mar 19 '19 at 14:12

1 Answers1

1

If you are on windows,open the file in NotePad and save as desired encoding . In Linux , DO the same in text editor. hope your program runs.