0

Here is a snippet of the page:

<tr id="product_34980" class="even">
<tr id="variant_100329" class="variantRow">

I want to extract the 34980 and the 100329. There could be multiple products and variants. I will be using python.

Thanks

Chuck Dickens
  • 309
  • 1
  • 9

2 Answers2

0

The link @Kirill Polishchuk gives is a favorite on SO, it states clearly why you should not use regex for this.

If however you still persist on using a regex, then try:

<tr[^>]*id="([^"]*)"[^>]*>

Your match is now in capture group #1

Community
  • 1
  • 1
gwillie
  • 1,893
  • 1
  • 12
  • 14
  • @hwnd, nice edit (I'm lazy sometimes, not good for SO policy), but still at a lost. Was the regex wrong. It seemed to do what the OP asked. Certainly don't mind constructive criticism. – gwillie Oct 16 '13 at 09:14
  • Thanks, I think I will take another approach. – Chuck Dickens Oct 16 '13 at 13:03
0
>>> p = re.compile('\d+')
>>> m = re.search(p, '<tr id="product_34980" class="even">')
>>> m.group()
'34980'
Ankur Agarwal
  • 23,692
  • 41
  • 137
  • 208