Trying to find text that contain price in with regular expressions

Question

So lets say the text i have is :

 <div>
    <span>one something 1 $2502</span><br>

    <span>
        one something 2
    </span><br>

    <span>one something 3 $25102
    </span><br>

    <span>
    one something 4 $2102</span><br>
</div>

I am trying to make a pattern that will catch all the text between the span so far I've managed to catch the first span no problem but the rest of them I have trouble with

Here is what I got so far:

\>(.*?\$\s*?(\d+\.?\d+).*?)\<

I thought of using something like \>\r*?\n*?(.*?\$\s*?(\d+\.?\d+).*?)>\r*?\n*?\< to catch the others but it won't work

score 4 · Accepted Answer · answered Dec 25 '12 at 18:24

4

You shouldn't be using regex to match markup languages; as soon as nested tags are involved, things get hairy very quickly. That said, on your examples where there is just plain text between two innermost tags involved, you could give this a try:

>[^<>]*\$\s*(\d+(?:\.\d*)?)[^<>]*<

That will match any text between two >...< delimiters (unless it contains angle brackets itself) that contains at least one number preceded by a $. If it's more than one, it'll capture the last one.

Explanation:

>       # Match >
[^<>]*  # Match anything besides < or >
\$      # Match $
\s*     # Match optional whitespace
(       # Match and capture...
 \d+    # a number
 (?:    # possibly followed by:
  \.\d* #  a dot and optional digits
 )?     # but make that part optional.
)       # End of capturing group
[^<>]*  # Match anything besides < or >
<       # Match <

answered Dec 25 '12 at 18:24

Tim Pietzcker

328,213
58
503
561

Which other approach would you suggest going other then markup match ? – Neta Meta Dec 25 '12 at 18:33
I've added () to yours `>([^<>]*\$\s*(\d+(?:\.\d*)?)[^<>]*)<` and it seem to catch everything i needed. i was wondering though if you could elaborate about a better way. – Neta Meta Dec 25 '12 at 18:50
Usually, you would use a DOM parser to identify relevant tags. Then you can use regex to find (for example) prices within the tag's text contents. But as I said, for such a simple example as yours, regex is probably sufficient. Just be aware of its limitations. – Tim Pietzcker Dec 25 '12 at 19:02

score 1 · Answer 2 · answered Dec 25 '12 at 18:25

1

<?php 
$string = ' <div>
    <span>one something 1 $2502</span><br>

    <span>
        one something 2
    </span><br>

    <span>one something 3 $25102
    </span><br>

    <span>
    one something 4 $2102</span><br>
</div>';
preg_match_all('~<span>(.+)</span>~Usi', $string, $matches);
print_r($matches[1]);
?>

Works fine for me.

answered Dec 25 '12 at 18:25

flxapps

1,066
1
11
24

I am not trying to find everything within span . trying to find something that match the price pattern – Neta Meta Dec 25 '12 at 18:33
Ow, okay. I don't exactly know, how your actual source looks like, but on your example source the following expression works, you might want to give it a try: `preg_match_all('~.*\$([0-9]+)\s*~Usi', $string, $matches);` – flxapps Dec 25 '12 at 18:47

davidrac · Answer 3 · 2012-12-25T18:44:57.283

0

Just picking everything within the span is simple: <span>([^<]*)<\/span>

Let me know if this works for you.

If you only want the price: <span>[^$<]*(\$\d+)[^<]*<\/span> should work

edited Dec 25 '12 at 18:44

answered Dec 25 '12 at 18:22

davidrac

10,723
3
39
71

I cannot pick up everything within the span as i dont know which span will have the certain price pattern – Neta Meta Dec 25 '12 at 18:30
So you can adapt: `[^$<]*(\$\d+)[^<]*<\/span>` – davidrac Dec 25 '12 at 18:44

elclanrs · Answer 4 · 2012-12-25T18:54:11.143

0

I wouldn't use a regex for this. If you add an id to your div you can easily grab the spans text by using the DOM tools:

var div = document.getElementById('mydiv');

var text = [].slice.call( div.childNodes ).filter(function( node ){
  return node.nodeName == 'SPAN'
}).map(function( span ){ return span.innerText });

console.log( text ); //=> ["one something 1 $2502", "one something 2", "one something 3 $25102", "one something 4 $2102"]

Edit: With jQuery what you can do is find a pattern, for example, if you know all the spans you want to grab have a br tag after it you could find them like this:

var $spans = $('span').filter(function(){
  return $(this).next('br').length
});

var text = $spans.map(function(){
  return $(this).text();
});

If the pattern is not unique then you might have to use regex after all...

edited Dec 25 '12 at 18:54

answered Dec 25 '12 at 18:36

elclanrs

92,861
21
134
171

I cannot change the HTML i receive – Neta Meta Dec 25 '12 at 18:37
Why not? Can't you add a class instead of id maybe? Regex is _not_ a good solution for this problem. Check out this famous answer http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. – elclanrs Dec 25 '12 at 18:38
The HTML is something i scrape from pages outside. – Neta Meta Dec 25 '12 at 18:41
Oh I see... then maybe you could put it into a container of your own so you can filter it with the DOM tools... Otherwise I'd suggest using jQuery for this, it'll be easier. – elclanrs Dec 25 '12 at 18:43
Explain using jquery for it ? – Neta Meta Dec 25 '12 at 18:48
Ah i read your link. seems you are not suppose to parse HTML :-) – Neta Meta Dec 25 '12 at 19:00

Trying to find text that contain price in with regular expressions

4 Answers4