As far as I know it is the concept of python to have only valid characters in a string, but in my case the OS will deliver strings with invalid encodings in path names I have to deal with. So I end up with strings that contain characters that are non-unicode.
In order to correct these problems I need to display these strings somehow. Unfortunately I can not print them because they contain non-unicode characters. Is there an elegant way to replace these characters somehow to at least get some idea of the content of the string?
My idea would be to process these strings character by character and check if the character stored is actually valid unicode. In case of an invalid character I would like to use a certain unicode symbol. But how can I do this? Using codecs
seems not to be suitable for that purpose: I already have a string, returned by the operating system, and not a byte array. Converting a string to byte array seems to involve decoding which will fail in my case of course. So it seems that I'm stuck.
Do you have an tips for me how to be able to create such a replacement string?