-1

I've inherited a piece of code that is all of a sudden not working. It's attempting to use a regular expression to match on various pieces of data in HTML content. I am not sure if it's because the spacing in the HTML content has recently changed, or if there is a larger issue. I can setup smaller matches on individual pieces of data, but I would rather keep it all in the one preg_match_all call.

Here is a sample of the code in question, as well as a sandbox link to show it's execution.

http://sandbox.onlinephpfunctions.com/code/2ebb3707a5d8cd5871005b4e77076cd230a8abca

$html_content = '<tr>
                <td class="">

                    <span class="new">1111</span>
                </td>
                <td data-order="title1"><a href="test.php?id=1111">title1</a></td>
                <td data-order="20190917000000">09/2019</td>
                <td data-order="1">$1</td>
                <td>02/18/2020</td>
            </tr>

            <tr>
                <td class="">

                    <span class="new">2222</span>
                </td>
                <td data-order="title2"><a href="test.php?id=2222">title2</a></td>
                <td data-order="20190917000000">09/2019</td>
                <td data-order="2">$2</td>
                <td>01/13/2020</td>
            </tr>

            <tr>
                <td class="">

                    <span class="new">3333</span>
                </td>
                <td data-order="title3"><a href="test.php?id=3333">title3</a></td>
                <td data-order="20190917000000">09/2019</td>
                <td data-order="5">$5</td>
                <td>01/13/2020</td>
            </tr>';

$content_array = array();   
preg_match_all('%>(\d+)</span>\s+</td>\s+<td data-order=".+?"><a href="(.+?)">.+?</a></td>\s+<td data-order="(\d+)000000">\d+/\d+</td>\s+<td data-order="\d+">\$\d+</td>\s+<td>(\d+/\d+/\d+)</td>\s+<td>(\d+/\d+/\d+)</td>%', $html_content, $content_array);

print_r($content_array);
user1110562
  • 393
  • 1
  • 9
  • 29
  • https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – AbraCadaver Mar 02 '20 at 22:18

1 Answers1

1

The regex is expecting 2 cells with dates at the end:

<td>(\d+/\d+/\d+)</td>\s+<td>(\d+/\d+/\d+)</td>

And you have only one:

<td>01/13/2020</td>

If you remove the extra date cell from the regex it matches:

>(\d+)</span>\s+</td>\s+<td data-order=".+?"><a href="(.+?)">.+?</a></td>\s+<td data-order="(\d+)000000">\d+/\d+</td>\s+<td data-order="\d+">\$\d+</td>\s+<td>(\d+/\d+/\d+)</td>
vuryss
  • 1,270
  • 8
  • 16