0

I wrote a script that gets me specific elements from the html string. Everything works, while the script get unnecessary values. I want to download only the second value (.*?). How can I modify the expression to get only the second value skip the rest?.

My script:

<?php
$html = '
<tr><td>AD - Andorra<td>CA - Canada
<tr><td>AE - United Arab Emirates<td>PR - Puerto Rico
<tr><td>AF - Afghanistan<td>US - United States of America
<tr><td>AG - Antigua and Barbuda<td>
';
preg_match_all('/<td>(.*?)<td>(.*?)\n/s', $html, $value);
print_r($value);
?>

2 Answers2

1

While there are many ways to skin this cat, and most people would insist that a regular expression solution is an absolute no-go, it seems to me that you are already there, your code produces the correct result in $value[2] -- an array holding the values of the second capturing parentheses. Here a psysh session executing your code --

>>> $html = '
             <tr><td>AD - Andorra<td>CA - Canada
             <tr><td>AE - United Arab Emirates<td>PR - Puerto Rico
             <tr><td>AF - Afghanistan<td>US - United States of America
             <tr><td>AG - Antigua and Barbuda<td>
    ;
    preg_match_all('/<td>(.*?)<td>(.*?)\n/s', $html, $value); 
    print_r($value);
... ... ... ... ... => """
\n
<tr><td>AD - Andorra<td>CA - Canada\n
<tr><td>AE - United Arab Emirates<td>PR - Puerto Rico\n
<tr><td>AF - Afghanistan<td>US - United States of America\n
<tr><td>AG - Antigua and Barbuda<td>\n
"""
>>> => 4
>>> Array
(
    [0] => Array
        (
            [0] => <td>AD - Andorra<td>CA - Canada
            [1] => <td>AE - United Arab Emirates<td>PR - Puerto Rico
            [2] => <td>AF - Afghanistan<td>US - United States of America
            [3] => <td>AG - Antigua and Barbuda<td>
        )
    [1] => Array
        (
            [0] => AD - Andorra
            [1] => AE - United Arab Emirates
            [2] => AF - Afghanistan
            [3] => AG - Antigua and Barbuda
        )
    [2] => Array
        (
            [0] => CA - Canada
            [1] => PR - Puerto Rico
            [2] => US - United States of America
            [3] => 
        )
)
=> true

You can modify the regular expression to only capture the second column by turning the first into non-capturing parenthesis '/<td>(?:.*?)<td>(.*?)\n/s': (notice the ?: added after the first opening (. Your desired result sits in $value[1] then. The modified code executed:

>>> $html = '
    <tr><td>AD - Andorra<td>CA - Canada
    <tr><td>AE - United Arab Emirates<td>PR - Puerto Rico
    <tr><td>AF - Afghanistan<td>US - United States of America
    <tr><td>AG - Antigua and Barbuda<td>
';
preg_match_all('/<td>(?:.*?)<td>(.*?)\n/s', $html, $value);
print_r($value);
... ... ... ... ... => """
   \n
   <tr><td>AD - Andorra<td>CA - Canada\n
   <tr><td>AE - United Arab Emirates<td>PR - Puerto Rico\n
   <tr><td>AF - Afghanistan<td>US - United States of America\n
   <tr><td>AG - Antigua and Barbuda<td>\n
"""
>>> => 4
>>> Array
(
    [0] => Array
        (
            [0] => <td>AD - Andorra<td>CA - Canada
            [1] => <td>AE - United Arab Emirates<td>PR - Puerto Rico    
            [2] => <td>AF - Afghanistan<td>US - United States of America
            [3] => <td>AG - Antigua and Barbuda<td> 
        )
    [1] => Array
        (
            [0] => CA - Canada
            [1] => PR - Puerto Rico
            [2] => US - United States of America
            [3] => 
        ) 
)
=> true
Tom Regner
  • 6,856
  • 4
  • 32
  • 47
0

You can split the string an work with an array. http://php.net/manual/en/function.explode.php

josedan10
  • 1
  • 3