4

First of all, this is not a dupe of this question.

In Javascript this expression seems to be evaluated correctly:

\\/(omniture|mbox|hbx|omniunih)(.*)?

If I pass it to Python re module, bad things happen. In fact, the following returns an error:

import re
re.compile (u'\\/(omniture|mbox|hbx|omniunih)(.*)?')

In [101]: re.compile (u'\\/(omniture|mbox|hbx|omniunih)(.*)?')
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
/home/fakk/spider.io/1/<ipython-input-101-b5b19eb3b66e> in <module>()
----> 1 re.compile (u'\\/(omniture|mbox|hbx|omniunih)(.*)?')

/usr/lib/python2.7/re.pyc in compile(pattern, flags)
    188 def compile(pattern, flags=0):
    189     "Compile a regular expression pattern, returning a pattern object."
--> 190     return _compile(pattern, flags)
    191 
    192 def purge():

/usr/lib/python2.7/re.pyc in _compile(*key)
    242         p = sre_compile.compile(pattern, flags)
    243     except error, v:
--> 244         raise error, v # invalid expression
    245     if len(_cache) >= _MAXCACHE:
    246         _cache.clear()

error: nothing to repeat

Python complains about the (.*)? part, which me myself am not able to understand.

My questions are:

  1. What does (.*)? do in JS? Match zero or one (?) of zero or more (*) chars (.)? What's the point?
  2. How can I translate it in Python?
Community
  • 1
  • 1
Jir
  • 2,985
  • 8
  • 44
  • 66
  • 2
    "Python complains about the (.*)? part" - Can you post the error message? (I guess it's "error: nothing to repeat", right?) – Mark Byers Jan 04 '12 at 10:38
  • You nailed it :) Actually, there isn't really anything to repeat. Other expressions who fail are: `\/webtrends(.*)?\.js`, `foresee-(trigger(.*)?|alive|analytics(.*)?)\.js`, `everestjs\.net|pixel([0-9]*)?\.everesttech\.net`. – Jir Jan 04 '12 at 10:45
  • `(.*)?` the `.` stands for any char, the `*` means that there can be as much char's which are like the one before (in this case any char) as you want, the brackets say that the stuff in it is group and the `?` means that the thing before can be there but must not (in this case it means that everything in the brackets must not be there) – noob Jan 04 '12 at 10:45

2 Answers2

6

The question mark is superfluous, as you reflect yourself, it doesn't really make any sense, remove it and you should be in business.

zrvan
  • 7,533
  • 1
  • 22
  • 23
  • Just to clarify, it's superfluous, but not strictly incorrect, compare this `.?.?.*`. It's a bit weird that Python can't handle it. – zrvan Jan 04 '12 at 10:54
  • Actually there's something a bit more subtle than the `?` being superfluous going on here. this is also a superfluous `?`: `(.?)?`, but python doesn't seem to mind. I tried looking through the sre_compile/sre_parse code to figure out what the deal is but it's a little much effort for a stack overflow question ;) – jjm Mar 19 '12 at 07:27
2

Your regular expression does not make sense, the ? at the end of your string is not needed and will in fact never match anything. In addition I suggest you use r'' to make your expression easier to read:

import re
my_regex = re.compile(r'\/(omniture|mbox|hbx|omniunih)(.*)')
NobRuked
  • 593
  • 4
  • 12