I am trying to figure out this error that pops up from this code:
filename = os.path.join(os.path.expanduser("~"), "data", "blogs",
"1005545.male.25.Engineering.Sagittarius.xml")
#filename = open('C:/Users/spenc/data/blogs/1005545.male.25.Engineering.Sagittarius.xml',
#encoding='utf-8', errors = 'ignore')
all_posts = []
allPosts = []
with open(filename) as inf:
postStart = False
post = []
for line in inf:
line = line.strip()
if line == "<post>":
postStart = True
elif line == "</post>":
postStart = False
allPosts.append("\n".join(post))
post =[]
elif postStart:
post.append(line)
print(allPosts[0])
print(len(allPosts))
filename.close()
and get this error:
File "D:\Anaconda-Python\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4836: character maps to <undefined> here
I am just trying to figure out the encoding error to make sure this works in finding the length of the posts and print the post itself, but it keeps getting caught up on the allposts.append line. Not really sure of anywork around or if there is a newer way of doing something of this sort. I was trying to follow a textbook on it, but cant continue on in the chapter until this has been worked out.