0

I am trying to figure out this error that pops up from this code:

filename = os.path.join(os.path.expanduser("~"), "data", "blogs",
"1005545.male.25.Engineering.Sagittarius.xml")
#filename = open('C:/Users/spenc/data/blogs/1005545.male.25.Engineering.Sagittarius.xml',
#encoding='utf-8', errors = 'ignore')

all_posts = []
allPosts = []
with open(filename) as inf:
    postStart = False
    post = []
    for line in inf:
        line = line.strip()
        if line == "<post>":
            postStart = True
        elif line == "</post>":
            postStart = False
            allPosts.append("\n".join(post))
            post =[]
        elif postStart:
            post.append(line)
print(allPosts[0])
print(len(allPosts))
filename.close()

and get this error:

  File "D:\Anaconda-Python\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined> here

I am just trying to figure out the encoding error to make sure this works in finding the length of the posts and print the post itself, but it keeps getting caught up on the allposts.append line. Not really sure of anywork around or if there is a newer way of doing something of this sort. I was trying to follow a textbook on it, but cant continue on in the chapter until this has been worked out.

Sbrads
  • 11
  • Please provide a [mcve]. Is there a particular reason not to use a proper XML parser here? – AMC Jul 08 '20 at 21:05
  • Does this answer your question? https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character – Ronald Jul 08 '20 at 21:11

0 Answers0