I'm trying to create a little program that reads the contents of two stories, Alice in Wonderland & Moby Dick, and then counts how many times the word 'the' is found in each story.
However I'm having issues with getting Geany text editor to open the files. I've been creating and using my own small text files with no issues so far.
with open('alice_test.txt') as a_file:
contents = a_file.readlines()
print(contents)
I get the following error:
Traceback (most recent call last):
File "add_cats_dogs.py", line 50, in <module>
print(contents)
File "C:\Users\USER\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2018' in position 279: character maps to <undefined>
As I said, no issues experienced with any small homemade text files.
Strangely enough, when I excecute the above code in Python IDLE, I have no problems, even if I change the text file's encoding between UTF-8 to ANSII.
I tried encoding the text file as UTF-8 and ANSII, I also checked to make sure the default encoding of Geany is UTF-8 (also tried without using default encoding), as well using and not using fixed encoding when opening non-Unicode files.
I get the same error every time. The text file was from gutenberg.org, I tried using another file from there and got the same issue.
I know it must be some sort of issue between Geany and the text file, but I can't figure out what.
EDIT: I found a sort of fix. Here is the text that was giving me problems:https://www.gutenberg.org/files/11/11-0.txt Here is the text that I can use without problems:http://www.textfiles.com/etext/FICTION/alice13a.txt Top one is encoded in UTF-8, bottom one is encoded in windows-1252. I would've imagined the reverse to be true, but for whatever reason the UTF-8 encoding seems to be causing the problem.