0

I am new to php and trying to extract data from url using preg_match_all

Problem is the matches are converted to strings and I cannot extract them individually

<?php
$pattern = '/<span class="product".*/i';
$string = file_get_contents('http://www.example.com/');

preg_match_all($pattern, $string, $matches);
echo '<b>preg_match_all()</b>';
echo '<pre>';
echo '<br /><b>Products:</b> ', var_dump($matches);
echo '</pre>';

Returns

preg_match_all()

Products: array(1) {
  [0]=> array(7) {
      [0] => string(46) "Product 1"
      [1] => string(42) "Product 2"
      [2] => string(46) "Product 3"
      [3] => string(41) "Product 4"
      [4] => string(58) "Product 5"
      [5] => string(42) "Product 6"
      [6] => string(37) "Product 7"
  }
}

I am trying to extract 1 item at a time (i.e. separate elements) and place each into own variable if possible. Example: $product1 = "Product 1"

If I try echo $matches[2]; to get Product 3 I get an undefined offset error

EDIT:

With help from this thread: Retrieve data contained a certain span class

Solution:

<?php
$html=file_get_contents('http://www.example.com/');
preg_match_all("/\<span class\=\"products\"\>(.*?)\<\/span\>/",$html,$b);

foreach($b as $key => $value) {
$$key = $value;
}

echo $value[4]; // Returns 4th key, or "Product 5"

Yes I am terrible at formatting code

Community
  • 1
  • 1
  • dont use regular expressions to parse html –  Jul 07 '13 at 20:22
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) –  Jul 07 '13 at 20:22
  • You don't `(capture)` anything in your regex. – CodeAngry Jul 07 '13 at 20:29
  • If you had provided an example of what you were trying to parse and the ouput you were looking to extract then maybe we would be able to work out what you need your regex to do - fir now, it's simply wrong. – symcbean Jul 07 '13 at 20:35
  • Did you not read symcbean? Please read last sentence. – Peeping Tom Jul 07 '13 at 20:36
  • As the code reads it works fine. I just want to be able to retrieve specific elements of the $matches output. – Peeping Tom Jul 07 '13 at 20:38
  • @TomSaget If you paste some sample `span`s here, I'll show you how to extract data from them :) So paste some raw markup you plan to parse. I want to reach 1000 points today ;) – CodeAngry Jul 07 '13 at 20:43
  • @CodeAngry, ok here is similar example. `Used Gibson USA
    Les Test Test Test Paul Custom 1986
    with Factory Kahler
    ` If I use `
    – Peeping Tom Jul 07 '13 at 20:53

1 Answers1

0
$markup = '<span class="Products-Name">Used Gibson USA</span>
    <span class="Products-Discription">Les Test Test Test Paul Custom 1986
    <br />with Factory Kahler </span>';
$markup = preg_replace('~<br\\s*/?>~si', ' ', $markup); // replace <br> with space
$markup = preg_replace('~\\s+~', ' ', $markup); // compact consecutive spaces into a single space
if(preg_match_all('~<span class="Products-(.+?)">(.*?)</span>~si', $markup, $matches)){
    // trim the enite deep array
    array_walk_recursive($matches, function(&$match){
        $match = trim($match);
    });
    // this shows you how the $matches is structured
    list($raw_matches, $class_matches, $inner_matches) = $matches;
    // combine class names with span inner value
    var_dump(array_combine($matches[1], $matches[2]));
}
// this is how you loop preg_match_all() results
foreach($matches[0] as $key => $raw_match){
    $class_match = $matches[1][$key];
    $inner_match = $matches[2][$key];
    if(!strcasecmp($class_match, 'what YOU seek')){
        echo $inner_match;
    }
}
CodeAngry
  • 12,760
  • 3
  • 50
  • 57