Regular expression to find meta robots noindex tag

Question

I have to check wether a page has a robots noindex meta tag in its source code and I want to catch as many different html syntax variants as possible.

First i tried get_meta_tags() function, but it has some limitations, so I decided to stick with preg_match.

I tried this regular expression:

"/<meta\s+name\s*=\s*[\"'](.*?)[\"']\s*content\s*=\s*[\"'].*?noindex.*?[\"']\s*\/?>/i"

however it fails when the noindex meta tag is like this (content part first):

<meta content="follow, index"  name="robots" />

Can anyone share a more appropriate regular expression to achieve my goal?

possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Anonymous, May 10 '14 at 10:38

score 0 · Answer 1 · answered Nov 18 '14 at 21:01

0

method without long/big preg's:

    if (preg_match_all('/\<meta.*?\>/mis',$s,$m) and strstr(join(',',$m[0]),'noindex')){

        echo 'page contains noindex meta tag';

    }else{

        echo 'without noindex meta tag';
    }

answered Nov 18 '14 at 21:01

michail.samolo

66
5

Regular expression to find meta robots noindex tag

1 Answers1