2

Code Iteration #2

Changing var1 to a raw string by using the stringVar = r'string' worked great. With the code below I am now getting an exception of:

Traceback (most recent call last):
  File "regex_test.py", line 8, in <module>
    pattern = re.compile(var2 + "(.*)")
  File "/usr/lib/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis

--

#!/usr/bin/python

import re

var1 = r'\\some\String\to\Match'
var2 = '\\\\some\\String\\'

pattern = re.compile(var2 + "(.*)")
found = pattern.match(var1, re.IGNORECASE)

if found:
    print "YES"
else:
    print "NO"

I am trying to include a variable in my regular expression. This question is related to this other question, but differs slightly by using a compiled pattern vs the variable within the match. According to everything I've read, the example code below should work.

#!/usr/bin/python

import re

var1 = re.escape('\\some\String\to\Match') # A windows network share
var2 = "\\\\some\\String\\"

print var1 # Prints \\some\\String\ o\\Match
print var2 # Prints \\some\String\

pattern = re.compile(var2)
found = pattern.match(var1 + "(.*)", re.IGNORECASE)

if found:
    print "YES"
else:
    print "NO"

When I print out my variables I am seeing some weird behavior. I thought the re.escape would escape all needed chars within a string.

When I execute the code in Python 2.7 on Ubuntu 12.4.1 I get the following exception

Traceback (most recent call last):
  File "regex_test.py", line 11, in <module>
    pattern = re.compile(var2)
  File "/usr/lib/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)

What am I missing that is causing the exception to be thrown?

Community
  • 1
  • 1
Wesley
  • 2,921
  • 6
  • 27
  • 30

3 Answers3

4

Python's \t is a single character. You might want to use r'' (you can google 'raw string' to learn more) to avoid that problem.

The same is true of the \ character.

To prove this, try printing the string inside re.escape before you feed it through. It should make sense.

This is what you're looking for:

var1 = re.escape(r'\\some\String\to\Match')
Chris Pfohl
  • 18,220
  • 9
  • 68
  • 111
  • Please see Code Iteration #2. Using the raw string worked to get rid of the \t issue I was having – Wesley Oct 22 '12 at 20:48
  • please `print` the value before you compile it and post it for us. I'll bet you'll immediately know what you did wrong. (BTW: the value I want is: `var2 + '(.*)'`) – Chris Pfohl Oct 22 '12 at 21:22
2

re.escape is used for escaping a string to use as a regular expression, but you escape var1 and then use var2 as the regular expression.

I think the following is what you are trying to accomplish:

var1 = r'\\some\String\to\Match'
var2 = re.escape('\\\\some\\String\\')
pattern = re.compile(var2 + '(.*)', re.IGNORECASE)
found = pattern.match(var1)

Note that the r'\\some\String\to\Match' is a raw string literal, but you cannot use it for var2 since it needs to end in a backslash.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • The end pattern should be \\some\String\(.*) which would should match var1 since I am getting any and all chars after the word 'String\'. With the code provided, I am still getting the unbalanced parenthesis exception when executing – Wesley Oct 22 '12 at 20:56
  • The end pattern is what you pass into `re.compile`, you should the string you are trying to match into `pattern.match`. – Andrew Clark Oct 22 '12 at 21:06
  • @Wesley did you copy and paste my code exactly? It should work fine. – Andrew Clark Oct 22 '12 at 21:26
  • I guess I somehow messed something up. That worked perfectly. Thank you – Wesley Oct 22 '12 at 21:31
0

re.escape does not make your life easier in this case:

In [68]: re.escape('\\\\some\\String\\')
Out[68]: '\\\\\\\\some\\\\String\\\\'

In [71]: re.escape(r'\\some\String\to\Match')

Out[71]: '\\\\\\\\some\\\\String\\\\to\\\\Match'

Clearly, there are too many backslashes there.

It is possible to do this with just raw strings:

In [62]: import re

In [63]: re.match(r'\\\\some\\String\\(.*)', r'\\some\String\to\Match').group(1)
Out[63]: 'to\\Match'
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677