If you're trying to figure out how to treat this as a single string '\\n'
that can then be interpreted as the single character '\n'
according to some set of rules, like Python's unicode-escape rules, you have to decide exactly what you want before you can code it.
First, to turn a list of two single-character strings into one two-character string, just use join
:
>>> value = ['\\', 'n']
>>> escaped_character = ''.join(value)
>>> escaped_character
'\\n'
Next, to interpret a two-character escape sequence as a single character, you have to know which escape rules you're trying to undo. If it's Python's Unicode escape, there's a codec named unicode_escape
that does that:
>>> character = escaped_character.decode('unicode_escape')
>>> character
u'\n'
If, on the other hand, you're trying to undo UTF-8 encoding followed by Python string-escape, or C backslash escapes, or something different, you obviously have to write something different. And given what you've said about UTF-8, I think you probably do want something different. For example, u'é'.encode('UTF-8')
is the two-byte sequence '\xce\xa9'
. Just calling decode('unicode_escape')
on that will give you the two-character sequence u'\u00c3\u00a9'
, which is not what you want.
Anyway, now that you've got a single character, just call ord
:
>>> char_ord = ord(character)
>>> char_ord
10
I'm not sure what the convert-to-unicode bit is about. If this is Python 3.x, the strings are already Unicode. If it's 2.x, and the strings are ASCII, it's guaranteed that ord(s) == ord(unicode(s))
. If it's 2.x, and the strings are in some other encoding, just calling unicode
on them is going to give you a UnicodeError
or mojibake; you need to pass an encoding in as well, in which case you might as well use the decode
method.