what's wrong with my python re.sub

Question

this is my code :

string ='''
{% emoji 'MONEY_BAG' %}<span style="color:#7F6C41;"><a href="{% mobile_url '/inventory/view_item/?category=weapon&inventory_id=%s' inventory_id %}">{{ item.name }}</a>を入手した!</span></span>


'''
a = r'''
{%\s+mobile_url\s+['"]{1}(/inventory/view_item/\?)[^'"]*['"]{1}\s+([^%}]+)\s+%}
'''

def aa(x):
    print x.group(1)
    print x.group(2)
    return ''

string = re.sub(a, aa, string)
print string

and it show :

{% emoji 'MONEY_BAG' %}<span style="color:#7F6C41;"><a href="{% mobile_url '/inventory/view_item/?category=weapon&inventory_id=%s' inventory_id %}">{{ item.name }}</a>を入手した!</span></span>

i want to print the x.group(1) and the x.group(2)

so what can i do ,

thanks

Can you please provide an example of your expected output? Also, is it possible that you are trying to [parse html with regex](http://stackoverflow.com/questions/1732348/)? — Björn Pollex, Jun 17 '11 at 07:05
It's not printing anything because the re never finds a match — John La Rooy, Jun 17 '11 at 07:06

score 2 · Answer 1 · answered Jun 17 '11 at 07:07

2

Your extra newline characters in a are causing the regex to never match

a = r'''{%\s+mobile_url\s+['"]{1}(/inventory/view_item/\?)[^'"]*['"]{1}\s+([^%}]+)\s+%}'''

answered Jun 17 '11 at 07:07

John La Rooy

295,403
53
369
502

score 2 · Accepted Answer · answered Jun 17 '11 at 07:15

2

It's a bad idea to use regex to extract information from HTML. It's much easier with a HMTL Parser: http://docs.python.org/library/htmlparser.html

Or if you want to crawl a webpage for more information, you might want to use scrapy which is a truly great web crawler framework.

answered Jun 17 '11 at 07:15

naeg

3,944
3
24
29

1

BeautifulSoup should be good for lightweight parsing http://www.crummy.com/software/BeautifulSoup/ – user Jun 17 '11 at 09:00

what's wrong with my python re.sub

2 Answers2