I'm looking for an efficient way to translate escape sequences in a string (Unicode) to target characters. The strings are some parsed language strings read from a file that we want to transform according to the rules: (note:the escaping rules are different to those of python itself)
\uxxxx (four hex digits) --> gives the Unicode character with the given code point
\LF \CR \CR+LF --> '' : a backslash character followed by a line break removes
both of them, where line break is not platform specific.
(For example: "aa\\\nbb", "aa\\\rbb", "aa\\\r\nbb" all gives "aabb")
\f --> FF char
\n --> LF char
\r --> CR char
\t --> TAB char
\C where C is any other *Unicode* character ---> gives C itself.
This includes the escaped backslash '\\' sequence, which should be consumed
first from left to right:
r'\\\\u0050' --> r'\\u0050'
r'\\\\\u0050' --> r'\\P'
(Basically these rules are somewhat similar to the escaping rules available in many languages for example Perl and Ruby if I'm not wrong)
(Please note: my usage of raw or normal form of strings in the examples is just for illustration to show how exactly the strings are translated)
Is it possible with such rules to improve on the most naive method of looping through the string and doing lookaheads, appending to a target string in the process.
A somewhat similar question here offers answers based on splitting and re-joining the string, but I don't think that can be applied here because of the successive escapes issue.