I have a large file where any unicode character that wasn't in UTF-8 got replaced by its code point in angle brackets (e.g. the "" was converted to "<U+0001F44D>"). Now I want to revert this with a regex substitution.
I've tried to acomplish this with
re.sub(r'<U\+([A-F0-9]+)>',r'\U\1', str)
but obviously this won't work because we cannot insert the group into this unicode escape. What's the best/easiest way to do this? I found many questions trying to do the exact opposite but nothing useful to 're-encode' these code points as actual characters...