0

I want to check my link in a website, but I also want to check is it visible. I wrote this code:

    $content = file_get_contents('tmp/test.html');
    $pattern = '/<a\shref="http:\/\/mywebsite.com(.*)">(.*)<\/a>/siU';
    $matches = [];
    if(preg_match($pattern, $content, $matches)) {
        $link = $matches[0];
        $displayPattern = '/display(.?):(.?)none/si';
        if(preg_match($displayPattern, $link)) {
            echo 'not visible';
        } else {
            echo 'visible';
        }
    } else {
        echo 'not found the link';
    }

It works, but not perfect. If the link is like this:

<a class="sg" href="http://mywebsite.com">mywebsite.com</a>

the fist pattern won't work, but if I change the \s to (.*) it gives back string from the first a tag. The second problem is the two pattern. Is there any way to merge the first with negation of the second? The merged pattern has 2 results: visible or not found/invisible.

Jenz
  • 8,280
  • 7
  • 44
  • 77
MrRP
  • 822
  • 2
  • 10
  • 25

1 Answers1

0

I'll try to guess. You are having a problem if your code(one that you fetch with file_get_contents) looks like this

<a class="sg" href="http://mywebsite.com">mywebsite.com</a>
.
.
.
<a href="http://mywebsite.com">mywebsite.com</a>

Your regex will return everything from first </a> tag because dot matches a new line(I guess you need it turned on, but if you dont, its 's' flag, so remove it)
Therefore

.*

will keep searching everything, so you need to make it greedy (when its greedy it will stop searching once it finds what its looking for), like this

.*?

Your regex should look like this then

<a.*?href="http:\/\/mywebsite.com(.*?)">(.*?)<\/a>
Traxo
  • 18,464
  • 4
  • 75
  • 87