36

I have a string. Let's call it 'test'. I want to test a match for this string, but only using the backref of a regex.

Can I do something like this:

import re

for line in f.readlines():
   if '<a href' in line:
      if re.match('<a href="(.*)">', line) == 'test':
         print 'matched!'

? This of course, doesn't seem to work, but I would think that I might be close? Basically the question is how can I get re to return only the backref for comparison?

jml
  • 1,745
  • 6
  • 29
  • 55
  • I recommend [Rubular](http://rubular.com/) for developing regex -- it's a huge time saver. Here's another question where I helped someone with a similar pattern: http://stackoverflow.com/questions/4716787/problem-with-ruby-regular-expression – Kyle Wild Jan 20 '11 at 01:37

1 Answers1

42

re.match matches only at the beginning of the string.

def url_match(line, url):
    match = re.match(r'<a href="(?P<url>[^"]*?)"', line)
    return match and match.groupdict()['url'] == url:

example usage:

>>> url_match('<a href="test">', 'test')
True
>>> url_match('<a href="test">', 'te')
False
>>> url_match('this is a <a href="test">', 'test')
False

If the pattern could occur anywhere in the line, use re.search.

def url_search(line, url):
    match = re.search(r'<a href="(?P<url>[^"]*?)"', line)
    return match and match.groupdict()['url'] == url:

example usage:

>>> url_search('<a href="test">', 'test')
True
>>> url_search('<a href="test">', 'te')
False
>>> url_search('this is a <a href="test">', 'test')
True

N.B : If you are trying to parsing HTML using a regex, read RegEx match open tags except XHTML self-contained tags before going any further.

Community
  • 1
  • 1
mouad
  • 67,571
  • 18
  • 114
  • 106
  • Great. Thanks for your reply. How would I replace the text and write the file? – jml Jan 20 '11 at 21:23
  • 1
    I should mention that although I read that post, it also says that you can use this method for a limited use case, which is what I have. I don't want to build an all encompassing parser. – jml Jan 20 '11 at 21:29
  • @jml: glad i was able to help :), for you question i don't exactly know what you mean but for replacing just use `re.sub` rather than `re.match`, and for writing to a file i think it's obvious, right ? :) you can post another question if you need more detail; like this you can have more help :) – mouad Jan 20 '11 at 22:12
  • not totally obvious to me. :/ my experience w/ regex thus far (in other langs) has been one of a matching tool, rather than a text replacement tool. i was not aware that the temp buffer which loads via 'r+' was modifiable. thanks again. – jml Jan 20 '11 at 23:23