My sample.txt:
é Roméo et Juliette vécu heureux chaque après
My program:
#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
with open("test4", "r") as f:
s = f.read()
print(s)
print(isinstance(s, unicode))
print(s[0].isalnum())
My output:
é Roméo et Juliette vécu heureux chaque après
False
False
From Python isalpha() and scandics and How do I check if a string is unicode or ascii? lead me to believe that both statements should be true.
My hypotheses:
Emacs is using "iso-latin-1" as the file encoding, which is mucking things up
isalnum() depends on something other than encoding
Line 2 isn't working
My biggest worry is #2. I do not really care about the result of isalnum(), I just want the result to be consistent for different machines/people. Worst case, I can just roll my own isalnum(); but I am curious why I am experiencing this behaviour in the first place.
Also, I want to be sure my program understand UTF-8 encoded documents across different machines as well.
Any ideas of what is going on?