-1

So lets say the text i have is :

 <div>
    <span>one something 1 $2502</span><br>

    <span>
        one something 2
    </span><br>

    <span>one something 3 $25102
    </span><br>

    <span>
    one something 4 $2102</span><br>
</div>

I am trying to make a pattern that will catch all the text between the span so far I've managed to catch the first span no problem but the rest of them I have trouble with

Here is what I got so far:

\>(.*?\$\s*?(\d+\.?\d+).*?)\<

I thought of using something like \>\r*?\n*?(.*?\$\s*?(\d+\.?\d+).*?)>\r*?\n*?\< to catch the others but it won't work

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Neta Meta
  • 4,001
  • 9
  • 42
  • 67

4 Answers4

4

You shouldn't be using regex to match markup languages; as soon as nested tags are involved, things get hairy very quickly. That said, on your examples where there is just plain text between two innermost tags involved, you could give this a try:

>[^<>]*\$\s*(\d+(?:\.\d*)?)[^<>]*<

That will match any text between two >...< delimiters (unless it contains angle brackets itself) that contains at least one number preceded by a $. If it's more than one, it'll capture the last one.

Explanation:

>       # Match >
[^<>]*  # Match anything besides < or >
\$      # Match $
\s*     # Match optional whitespace
(       # Match and capture...
 \d+    # a number
 (?:    # possibly followed by:
  \.\d* #  a dot and optional digits
 )?     # but make that part optional.
)       # End of capturing group
[^<>]*  # Match anything besides < or >
<       # Match <
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Which other approach would you suggest going other then markup match ? – Neta Meta Dec 25 '12 at 18:33
  • I've added () to yours `>([^<>]*\$\s*(\d+(?:\.\d*)?)[^<>]*)<` and it seem to catch everything i needed. i was wondering though if you could elaborate about a better way. – Neta Meta Dec 25 '12 at 18:50
  • Usually, you would use a DOM parser to identify relevant tags. Then you can use regex to find (for example) prices within the tag's text contents. But as I said, for such a simple example as yours, regex is probably sufficient. Just be aware of its limitations. – Tim Pietzcker Dec 25 '12 at 19:02
1
<?php 
$string = ' <div>
    <span>one something 1 $2502</span><br>

    <span>
        one something 2
    </span><br>

    <span>one something 3 $25102
    </span><br>

    <span>
    one something 4 $2102</span><br>
</div>';
preg_match_all('~<span>(.+)</span>~Usi', $string, $matches);
print_r($matches[1]);
?>

Works fine for me.

flxapps
  • 1,066
  • 1
  • 11
  • 24
  • I am not trying to find everything within span . trying to find something that match the price pattern – Neta Meta Dec 25 '12 at 18:33
  • Ow, okay. I don't exactly know, how your actual source looks like, but on your example source the following expression works, you might want to give it a try: `preg_match_all('~.*\$([0-9]+)\s*~Usi', $string, $matches);` – flxapps Dec 25 '12 at 18:47
0

Just picking everything within the span is simple: <span>([^<]*)<\/span>

Let me know if this works for you.

If you only want the price: <span>[^$<]*(\$\d+)[^<]*<\/span> should work

davidrac
  • 10,723
  • 3
  • 39
  • 71
0

I wouldn't use a regex for this. If you add an id to your div you can easily grab the spans text by using the DOM tools:

var div = document.getElementById('mydiv');

var text = [].slice.call( div.childNodes ).filter(function( node ){
  return node.nodeName == 'SPAN'
}).map(function( span ){ return span.innerText });

console.log( text ); //=> ["one something 1 $2502", "one something 2", "one something 3 $25102", "one something 4 $2102"]

Edit: With jQuery what you can do is find a pattern, for example, if you know all the spans you want to grab have a br tag after it you could find them like this:

var $spans = $('span').filter(function(){
  return $(this).next('br').length
});

var text = $spans.map(function(){
  return $(this).text();
});

If the pattern is not unique then you might have to use regex after all...

elclanrs
  • 92,861
  • 21
  • 134
  • 171