What are some general guidlines to writing unicode programs in python <= 2.7? Is it good practice to prepend every string with u, even if it doesn't contain any characters outside of the ASCII range?
When dealing with sqlite3, will a parameterized query automatically encode unicode as utf-8, or does that need to be done manually?
When dealing with a 'string' of bytes, should this be left as a string object or decoded into a unicode string? (I believe this would throw an exception in most cases )
If for any reason I need to use a literal unicode character in the code, can I just use that character in a string as long as it is a unicode string and I have my encoding declared at the top of the file?
EDIT: When printing a unicode string, how do I get the locale of the user's system so that I can correctly encode it? Blindly encoding everything as utf-8 seems like a bad idea since not all systems support it. EDIT: I believe I figured this one out. It can be done using locale
import locale
encoding = locale.getpreferredencoding()
EDIT: Is this encoding actually done implicitly? Now I am very confused. On linux, I can do this
s = u'\u2c60'
print s # prints Ⱡ
print s.encode('utf-8') # prints Ⱡ
But on windows this happens
s = u'\u2c60'
print s # prints Ⱡ in IDLE, UnicodeEncodeError in cmd
print s.encode('cp1252') # UnicodeEncodeError
print s.encode('utf-8') # prints â±
print s.encode('cp1252', 'replace') # prints ?
It does seem like print does the conversion implicitly...
EDIT: This question says print will auto encode to the encoding stored in sys.stdout.encoding Why Does Python print unicode characters when the default encoding is ASCII?
Now I'm wondering, is there a way to make the default behavior of print to replace unencodable characters? Or do I need to wrap print in my own function, something like:
def myPrint(msg):
print msg.encode(sys.stdout.encoding, 'replace')
I know most of these problems have been addressed in Python 3, but I would like to support python <= 2.7.