2

I want to get just the elements with this id pattern "answer-[0-9]*"

I'm using this regex in select "div[id~=answer-[0-9]*]"

The matching elements are:

<div class="post-text" id="answer-45881">

and

<div class="hidden modal modal-flag" id="answer-flag-modal45881">

What must I change to get only the first one?

luckasx
  • 359
  • 1
  • 6
  • 20

3 Answers3

4

Based on example from official tutorial

[attr~=regex]: elements with attribute values that match the regular expression; 
e.g. img[src~=(?i)\.(png|jpe?g)]

it looks like jsoup simply checks if attribute contains some part which can be matched with regex (like in this example .png or .jpg), not if entire value of attribute is matched by regex.

To check if regex matches entire string you need to place anchors representing start of the string ^ and end of the string $.

Also instead of * you probably should use + if you want to make number part mandatory.

So try with div[id~=^answer-[0-9]+$]

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • When I read the docs I understood that matches the entire. Just coding to know that is to some part. Thanks, your solution works! – luckasx Feb 21 '15 at 19:46
2

The * operator means "zero or more" times so it will still match the second example. You need to use the + operator instead meaning "one or more" times. So, your syntax would be:

div[id~=answer-[0-9]+]
hwnd
  • 69,796
  • 4
  • 95
  • 132
1

It looks like it searches id to contain this pattern, not to match.

"div[id~=answer-[0-9]*$]"

should work then.

Maksim
  • 264
  • 7
  • 20