I have some strings in Python loaded from a file. They look like lists, but are actually strings, for example:
example_string = '["hello", "there", "w\\u00e5rld"]'
I can easily convert it into an actual list of strings:
def string_to_list(string_list:str) -> List[str]:
converted = string_list.replace('"', '').replace('[', '').replace(']', '').split(',')
return [s.strip() for s in converted]
as_list = string_to_list(example_string)
print(as_list)
Which returns the following list of strings: ["hello", "there", "w\\u00e5rld"]
The problem is the encoding of the last element of the string. It looks like this when I run print(as_list)
, but if I run
for element in as_list:
print(element)
it returns
hello
there
w\u00e5rld
I dont know what happens to the first backslash, it seems to me like it is there to escape the second one in the encoding. How do I make Python just resolve the UTF-8 character and print "wørld"? The problem is that it is a string, not an encoding, so as_list[2].decode("UTF-8")
does not work.
I tried using string.decode(), and I tried plain printing