0

I am trying to get some data from Amazon and I'm using preg_match to find the elements that I need. However, I'm running into issues.

I combine two statements so if it doesn't find one it looks for the other. I believe unless the product is not listed one of those things will always exist.

So what its doing is looking for shipping cost. If its not there is looks for the "FREE Shipping" text.

preg_match_all('/(& <b>(.*?)<|<span class="olpShippingPrice">(.*?)<)/',$results,$match1);

If I run this I get the data I want but it's grabbing some HTML that would NOT be grabbed if I ran this in two seperate preg_matches. I cannot figure out how to show it but it's grabbing the bold tag on the first 'FREE Shipping' and all text below that is bold. You can see the carrots also.

  [1]=>
   array(10) {
     [0]=>
     string(38) "$30.00<"
     [1]=>
     string(37) "$6.99<"
     [2]=>
     string(37) "$6.99<"
     [3]=>
     string(38) "$53.99<"
     [4]=>
     string(37) "$5.25<"
     [5]=>
     string(19) "& FREE Shipping<"
     [6]=>
     string(19) "& FREE Shipping<"
     [7]=>
     string(19) "& FREE Shipping<"
     [8]=>
     string(19) "& FREE Shipping<"
     [9]=>
     string(38) "$70.39<"
   }

So my question: What must I do to remove the tags and the carrots from this so I am left with clean data? Also, running these in two separate preg_match's doesn't work for me.

smack-a-bro
  • 691
  • 2
  • 13
  • 27

1 Answers1

1

Without seeing your sample text, it's hard to know exactly what you need. But the main thing you need to do is take those "unwanted" characters out of the capture group; then use the capture group as your clean data:

preg_match_all('/(?:& <b>|<span class="olpShippingPrice">)(.*?)</',$results,$match1);
Brian Stephens
  • 5,161
  • 19
  • 25