3

I am reading a text file in Python that, among other things, contains pre-written regexes that will be used for matching later on. The text file is of the following format:

...

--> Task 2

Concatenate and print the strings "Hello, " and "world!" to the screen.

--> Answer

Hello, world!

print(\"Hello,\s\"\s*+\s*\"world!\")

--> Hint 1

You can concatenate two strings with the + operator

...

User input is being accepted based on tasks and either executed in a subprocess to see a return value or matched against a regex. The issue, though, is that python's file.readline() will escape all special characters in the regex string (i.e. backslashes), giving me something that isn't useful.

I tried to read in the file as bytes and decode the lines using the 'raw_unicode_escape' argument (described as producing "a string that is suitable as raw Unicode literal in Python source code"), but no dice:

file.open(filename, 'rb')
for line in file:
  line = line.decode('raw_unicode_escape')
  ...

Am I going about this the completely wrong way?

Thanks for any and all help.

p.s. I found this question as well: Issue while reading special characters from file. However, I still have the same trouble when I use file.open(filename, 'r', encoding='utf-8').

Community
  • 1
  • 1
Zachary Allaun
  • 342
  • 1
  • 4
  • 12

1 Answers1

4

Python regex patterns are just plain old strings. There should be no problem with storing them in a file. Perhaps when you use file.readline() you are seeing escaped characters because you are looking at the repr of the line? That should not be an issue when you actually use the pattern as a regex however:

import re
filename='/tmp/test.txt'
with open(filename,'w') as f:
    f.write(r'\"Hello,\s\"\s*\+\s*\"world!\"')

with open(filename,'r') as f:
    pat = f.readline()
    print(pat)
    # \"Hello,\s\"\s*\+\s*\"world!\"
    print(repr(pat))
    # '\\"Hello,\\s\\"\\s*\\+\\s*\\"world!\\"'
    assert re.search(pat,'  "Hello, " +   "world!"')  # Shows match was found
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • This was a frustrating experience. You're correct, thank you; I was looking at repr(string). – Zachary Allaun Nov 05 '11 at 20:29
  • And you *should* use repr(string), otherwise you are not getting an unambiguous representation of what's actually in your string. You just need to understand what is going on. Merely reading a file does *not* do any escaping or unescaping. – John Machin Nov 05 '11 at 21:29
  • Thanks for the addition, John. I agree. I'm still relatively new to all this, and making mistakes like this definitely give me a better understanding of what's happening in the background. – Zachary Allaun Nov 05 '11 at 22:01
  • Sheesh I've been banging on this for an hour now. Thanks for the clue stick. – GoingTharn Jul 01 '14 at 19:49