I have a file that contains ASCII lines like
"\u0627\u0644\u0625\u062f\u0627"
(including the quote marks). I want to output these lines with the actual UTF-8 characters, like
"الإدا"
(These happen to be Arabic, but a solution would presumably work fine for any Unicode code points, at least in the Basic plane.)
If I type in an ASCII string like that to the Python3 interpreter, say
s = '"\u0627\u0644\u0625\u062f\u0627"'
and then ask Python what the value of that variable is, it displays the string in the way I want:
'"الإدا"'
But if I readline() a file containing strings like that, and write each line back out, I just get the ASCII representation back out. In other words, this code:
for s in stdin.readlines(): stdout.write(s)
just gives me back an output file identical to the input file.
How do I convert the read-in string so it writes out as the UTF-8 (not just ASCII) output, including the non-ASCII UTF-8 characters?
I know I can parse the string and handle each \uXXXX sub-string individually using regex, slices and chr(int()). But surely there is a way to use Python's built-in handling of strings represented in this way, so I don't have to parse the strings myself, not to mention being faster. (And yes, if there are improperly represented \u strings in the input, I can deal with the resulting error msgs.)