-1

I'm trying to get the numbers in an array. This is my string and my code.

$split_times = "return escape('<table class=\'split\' ><tr><td class=\'split0\'>50m</td><td class=\'split1\'>28.86</td><td class=\'split2\'>28.86</td></tr><tr><td class=\'split0\'>100m</td><td class=\'split1\'>1:01.56</td><td class=\'split2\'>32.70</td></tr><tr><td class=\'splitsep\' colspan=\'3\'></td></tr><tr><td class=\'split0\'>150m</td><td class=\'split1\'>1:36.88</td><td class=\'split2\'>35.32</td></tr><tr><td class=\'split0\'>200m</td><td class=\'split1\'>2:59:09.93</td><td class=\'split2\'>33.05</td></tr></table>')";

preg_match_all("/split1\\\'>(\d+(?:\.\d+)?)</", $split_times, $split_times_distances);
print_r($split_times_distances);

It should return an array like so:

Array
(
    [0] => Array
        (
            [0] => split1\'>28.86<
            [1] => split1\'>1:01.56<
            [2] => split1\'>1:36.88<
            [3] => split1\'>2:59:09.93<
        )

    [1] => Array
        (
            [0] => 28.86
            [1] => 1:01.56
            [2] => 1:36.88
            [3] => 2:59:09.93
        )

)

but instead, it only shows the first index of both arrays.

Derk Jan Speelman
  • 11,291
  • 4
  • 29
  • 45

3 Answers3

1

Your regex doesn't match =\'split1\'>1:36.88<

You have to add (?:\d+:){0,2} at the begining.

$split_times = "return escape('<table class=\'split\' ><tr><td class=\'split0\'>50m</td><td class=\'split1\'>28.86</td><td class=\'split2\'>28.86</td></tr><tr><td class=\'split0\'>100m</td><td class=\'split1\'>1:01.56</td><td class=\'split2\'>32.70</td></tr><tr><td class=\'splitsep\' colspan=\'3\'></td></tr><tr><td class=\'split0\'>150m</td><td class=\'split1\'>1:36.88</td><td class=\'split2\'>35.32</td></tr><tr><td class=\'split0\'>200m</td><td class=\'split1\'>2:59:09.93</td><td class=\'split2\'>33.05</td></tr></table>')";

preg_match_all("/split1\\\'>((?:\d+:){0,2}\d+(?:\.\d+)?)</", $split_times, $split_times_distances);
//                    here __^^^^^^^^^^^^^
print_r($split_times_distances);

Output:

Array
(
    [0] => Array
        (
            [0] => split1\'>28.86<
            [1] => split1\'>1:01.56<
            [2] => split1\'>1:36.88<
            [3] => split1\'>2:59:09.93<
        )

    [1] => Array
        (
            [0] => 28.86
            [1] => 1:01.56
            [2] => 1:36.88
            [3] => 2:59:09.93
        )

)
Toto
  • 89,455
  • 62
  • 89
  • 125
1

You have already extracted your string from an onMouse... attribute using DOMDocument, why not continue? Without using a dedicated Javascript parser, it's easy to extract Javascript strings, then all you have to do is to remove escaped quotes to obtain the "raw" string:

$onMouseAttr = "return escape('<table class=\'split\' ><tr><td class=\'split0\'>50m</td><td class=\'split1\'>28.86</td><td class=\'split2\'>28.86</td></tr><tr><td class=\'split0\'>100m</td><td class=\'split1\'>1:01.56</td><td class=\'split2\'>32.70</td></tr><tr><td class=\'splitsep\' colspan=\'3\'></td></tr><tr><td class=\'split0\'>150m</td><td class=\'split1\'>1:36.88</td><td class=\'split2\'>35.32</td></tr><tr><td class=\'split0\'>200m</td><td class=\'split1\'>2:59:09.93</td><td class=\'split2\'>33.05</td></tr></table>')";

# first step: extracting the strings

$stringPattern = <<<'EOD'
~ " ( [^"\\]* (?:\\.[^"\\]*)* ) "  |  ' ( [^'\\]* (?:\\.[^'\\]*)* ) ' ~xsS
EOD;

if ( preg_match_all($stringPattern, $onMouseAttr, $matches, PREG_SET_ORDER) ) {

    foreach ($matches as $match) {
        # unescape the string for the correct quote
        $html = isset($match[2]) ? str_replace("\\'", "'", $match[2])
                                 : str_replace('\\"', '"', $match[1]);

        # extract the nodes you want with DOMDocument/DOMXPath
        $dom = new DOMDocument;
        $dom->loadHTML($html);
        $xp = new DOMXPath($dom);
        $nodeList = $xp->query('//td[@class="split1"]');
        foreach ($nodeList as $node) {
            # display them
            echo $node->nodeValue, PHP_EOL;
            # or store them
            # $results[] = $node->nodeValue;
        }
    }
}
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Thank you for your answer, this is a solid way to do this indeed. What I came up with eventually was this: `preg_match_all("/split0\\\'>(.*?)m", $split_times, $split_times_distances)`. I do have another (what I think is) a really great question about XML and its textContent parameter, I will post another comment here to notify you! – Derk Jan Speelman Jul 10 '17 at 21:12
  • [This is my XML question](https://stackoverflow.com/questions/45022059/xml-php-different-ouputs-of-attribute-inside-query-and-outside-query-but-with) – Derk Jan Speelman Jul 10 '17 at 21:40
0

What I did was this: simply select all characters between two strings. In this case based on the <td> classname.

print_r(preg_match_all("/split1\\\'>(.*?)</", $split_times, $split_times_distances));

Output:

Array
(
    [0] => Array
        (
            [0] => split1\'>28.86<
            [1] => split1\'>1:01.56<
            [2] => split1\'>1:36.88<
            [3] => split1\'>2:59:09.93<
        )

    [1] => Array
        (
            [0] => 28.86
            [1] => 1:01.56
            [2] => 1:36.88
            [3] => 2:59:09.93
        )

)
Derk Jan Speelman
  • 11,291
  • 4
  • 29
  • 45