3

I have an utf-8 encoded file that contains multiple lines like

\x02I don't like \x0307bananas\x03.\x02
Hey, how are you doing?
You called?

How do I read the lines of that file to a list, decoding all the escape sequences? I tried the code below:

with codecs.open(file, 'r', encoding='utf-8') as q:
    quotes = q.readlines()

print(str(random.choice(quotes)))

But it prints the line without decoding escape characters.

\x02I don't like \x0307bananas\x03\x02

(Note: escape characters are IRC color codes, \x02 being character for bolded text, and \x03 prefix for color codes. Also, this code is from within my IRC bot, with the MSG function replaced by print())

zertap
  • 220
  • 3
  • 13

3 Answers3

11

According to this answer, changing the following should have the expected result.

In Python 3:

codecs.open(file, 'r', encoding='utf-8') to

codecs.open(file, 'r', encoding='unicode_escape')

In Python 2:

codecs.open(file, 'r', encoding='string_escape')

Community
  • 1
  • 1
zertap
  • 220
  • 3
  • 13
  • If the question can be answered by simply citing an answer on another question, that generally makes the question a duplicate. Please vote/flag to close such questions rather than answering them. – Karl Knechtel Aug 05 '22 at 02:36
1

If you want to output text to console with the same formatting, then the point is, that UNIX (or what OS do you use?) uses ANSI escape sequences different from those in IRC, so you have to translate IRC format to UNIX format. these are the links to start:
https://stackoverflow.com/a/287944/2660503
Color text in terminal applications in UNIX

If you want to print text without formating, just clean it, by using regexp.

Community
  • 1
  • 1
eyeinthebrick
  • 528
  • 1
  • 5
  • 21
  • thanks but that snippet is from my IRC bot, I just replaced the `MSG` function with `print()`. I'll edit my question to mention this. – zertap May 29 '14 at 13:11
1

The solution, as some people suggested is using codecs.open(file, 'r', encoding='unicode_escape'), which will look like the following once implemented:

with codecs.open(file, 'r', encoding='unicode_escape') as q:
    quotes = q.readlines()

print(str(random.choice(quotes)))

If you use regular utf-8 decoding, the result for \x02I don't like \x0307bananas\x03.\x02 will actually be "\\x02I don't like \\x0307bananas\\x03.\\x02\n" because readlines() method will escape the characters for you

Andrew
  • 1,238
  • 3
  • 13
  • 28