1

Say I have this HTML fragment

            <td width="25%" style="text-align:right;" >
                <span style="font-family: Arial; font-size: 12px;">
                <strong>Regular price:</strong></span>
            </td>                         
            <td width="25%"  style="text-align:center;" >
                <span style="font-family: Arial; font-size: 12px;>
                <strong>11,100.00 USD</strong></span>
            </td>
            <td width="25%" style="text-align:right;" >
                <span style="font-family: Arial; font-size: 12px;">
                <strong>Web price:</strong></span>
            </td>                         
            <td width="25%"  style="text-align:center;" >
                <span style="font-family: Arial; font-size: 12px;>
                <strong>9,100.00 USD</strong></span>
            </td>

How would I get the first amount (11,100.00) using regex? I tried

/Regular price.+(\d[^\s]+)\sUSD/is

but it's not working, it's returning '00'. Obviously I'm a newbie at regex and I'm hoping to get by without picking up a book.

And I HAVE to use regex to parse this HTML because that particular site was designed by someone who doesn't know about classes or IDs.

DMIL
  • 693
  • 3
  • 7
  • 18
  • There is a legendary answer available here : [1] [1]: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Parakram Majumdar Mar 27 '12 at 22:48
  • Did you read the last sentence in my question? This particular HTML doesn't use any classes or IDs so I have no choice but to use Regex. And Regex can parse strings, which HTML is when I put it in a string variable. Of course, I've read that stupid answer previously, but if you know of a better way to parse a HTML without any classes or IDs, let me know. – DMIL Mar 27 '12 at 22:51
  • Using **jQuery** you can get the result much easier and faster than going with the Regex option. – inhan Mar 27 '12 at 22:56
  • So how would I use jQuery to get that amount in the string above? – DMIL Mar 27 '12 at 23:01
  • 1
    @DML - As for a jQuery solution, you can use the `:contains` selector to find the first td that contains "USD", for example... http://api.jquery.com/contains-selector/ – Steve Wortham Mar 27 '12 at 23:37
  • 2
    `$('strong:contains("Regular price:")').closest('td').next().find('strong').text().match(/[\d\.,]+/)[0]);` or `$('strong:contains("Regular price:")').closest('td').next().find(':contains(USD)').text().match(/[\d\.,]+/)[0]);` – inhan Mar 27 '12 at 23:47
  • Oh wow, that's like... wow... I didn't know jQuery could actually do that. – DMIL Mar 28 '12 at 01:43

3 Answers3

3

You do not have to use regex. You can always search the DOM to get to the element you want, however, in this particular case (you're not parsing HTML, you're just searching for a certain string), your only issue is that your regex is greedy. Adding a ? will modify it to not be greedy (stop as soon as it finds a match, instead of getting the longest possible match):

/Regular price.+?(\d[^\s]+)\sUSD/is
Paul
  • 139,544
  • 27
  • 275
  • 264
  • I'm out of votes, but your answer is correct! http://www.rubular.com/r/vONnrbLbIL – mellamokb Mar 27 '12 at 22:50
  • @mellamokb Lol, you must've been on SO a lot today. Good job :) Votes reset in a little over an hour, if you want to keep my answer in mind until then and are still going to be online ;) – Paul Mar 27 '12 at 22:53
  • How would I search the DOM (say with Xpath) to get the contents of this particular element? The HTML in question doesn't use IDs or classes so other than counting (and hoping the count remains the same even if they add something to the site), I don't know how. – DMIL Mar 27 '12 at 22:57
1

Use .+? instead of .+. with the question mark you start the lazy operator.

kappa
  • 1,559
  • 8
  • 19
0

TXR solution: ( http://www.nongnu.org/txr )

Script in data.txr:

@(skip)
            <td @(skip)>
@(skip)
                <strong>Regular price:</strong></span>
            </td>
            <td @(skip)>
                <span @(skip)>
                <strong>@price USD</strong></span>
            </td>

Run:

$ txr data.txr data.html
price="11,100.00"
Kaz
  • 55,781
  • 9
  • 100
  • 149