1

I have to check wether a page has a robots noindex meta tag in its source code and I want to catch as many different html syntax variants as possible.

First i tried get_meta_tags() function, but it has some limitations, so I decided to stick with preg_match.

I tried this regular expression:

"/<meta\s+name\s*=\s*[\"'](.*?)[\"']\s*content\s*=\s*[\"'].*?noindex.*?[\"']\s*\/?>/i"

however it fails when the noindex meta tag is like this (content part first):

<meta content="follow, index"  name="robots" />

Can anyone share a more appropriate regular expression to achieve my goal?

John Bupit
  • 10,406
  • 8
  • 39
  • 75
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Anonymous May 10 '14 at 10:38

1 Answers1

0

method without long/big preg's:

    if (preg_match_all('/\<meta.*?\>/mis',$s,$m) and strstr(join(',',$m[0]),'noindex')){

        echo 'page contains noindex meta tag';

    }else{

        echo 'without noindex meta tag';
    }