-1

I want to read several .text documents but got some error on the line

lyrics = "".join(f.readlines())

The error is:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1148: character maps to <undefined>

How can I fix it. It would be helpful if anyone fixes it.

My code function is:

def read_lyrics():
reg1 = re.compile("\.txt$")
reg2 = re.compile("([0-9]+)\.txt")
reg3 = re.compile(".*_([0-9])\.txt")
reg4 = re.compile("\[.+\]")
reg5 = re.compile("info\.txt")
lyrics_dictionary = {}
#iter all directory and load all song(txt file)
for i in os.listdir():
    if os.path.isdir(i):
        for path,sub,items in os.walk(i):
            if any([reg1.findall(item) for item in items]):
                for item in items:
                    if reg5.findall(item):
                        continue
                    if reg3.findall(item):
                        num = ["0"+reg3.findall(item)[0]]
                        name = "_".join(path.split("/") + num)
                    else:
                        name = "_".join(path.split("/") + reg2.findall(item))
                    
                    print("The path is: ", path)
                    print("The item is: ", item)
                    
                    with open(os.path.join(path,item),"r") as f:
                        print("The file path is: ", f)
                        lyrics = "".join(f.readlines()) 
                        
                        lyrics = reg4.subn("",lyrics)[0]
                        lyrics_dictionary[name] = lyrics
return lyrics_dictionary
Shaido
  • 27,497
  • 23
  • 70
  • 73
yagya
  • 41
  • 1
  • 6
  • 1
    So when you tried putting `'charmap'+codec+can't+decode+byte+0x8d+in+position+1148` into a search engine, and looked at [the results](https://duckduckgo.com/?t=ffsb&q=%27charmap%27+codec+can%27t+decode+byte+0x8d+in+position+1148&ia=web), and tried the advice in the results, what happened? – Karl Knechtel Jan 30 '21 at 09:53
  • Your `open` call is `open(os.path.join(path,item),"r") as f`. In Python 3 that would open a file with the default encoding of UTF-8. But you are getting an error message about a charmap encoding, which suggests to me that you might be running this code in Python 2. If you are, your `print()` calls will put parens in the output, as `('The path is', '...')`. – BoarGules Jan 30 '21 at 10:32

1 Answers1

1

When you use open(), you also use a default encoding. It most likely didn't fit you. Try using something like - with open(os.path.join(path,item),"r",encoding='utf8') Or, if you can, check what is the enryption which was used on this file.

Try to check the answers this post, one of them might help you.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Noga K
  • 398
  • 1
  • 8
  • When the question is answered by an existing Stack Overflow post like this, please do not add your own answer - instead, vote to close the question as a duplicate. – Karl Knechtel Jan 30 '21 at 09:56