0

I am looking for a regex code for a meta description tag.

#<meta|name="description|".*content|="([^"]+)"># <- 

That is what I have, but it doesn't match capital letters, as I discovered some tags are like META DESCRIPTION =, etc.

Is there a new code or a way to change this one to match capital letter content?

Nikola K.
  • 7,093
  • 13
  • 31
  • 39
Noah Smith
  • 203
  • 4
  • 9
  • That regex doesn't match how you think it does. The problem here is not about capital letters. For example, it would also match the string "foocontent" – memowe Sep 23 '12 at 10:02
  • the problem with your regex (not counting that it has bunch of `OR` in it) is that meta doesn't always have to be formatted this way... can have spaces, line breaks, different quotes... if it's for scrapping purposes, I would suggest to gather all meta tags and then process it - like this: `~]+>~is` – Flash Thunder Sep 13 '19 at 09:38

5 Answers5

1

Add flag i after your last #.

Like this:

#<meta|name="description|".*content|="([^"]+)">#i

That will tell your regular expression to be case insensitive. Read more about flags here.

Nikola K.
  • 7,093
  • 13
  • 31
  • 39
1

Your regex <meta|name="description|".*content|="([^"]+)"> is broken, it means:

  • <meta
    OR
  • name="description
    OR
  • " followed by anything followed by content
    OR
  • =" followed by at least one character that is not " followed by ">

Warning!

Let me say that parsing HTML with regular expressions is a very bad idea.

Regex alternative for training purposes

But if you want to try something out for training, start improving this:

#<meta name="description" content="([^"]+)">#i

which is case-insensitive and does what you think it does.

False negatives

Beware that it won't match valid elements like this:

<meta name="description"      content="foo bar baz">

or

<meta
   name="description"
   content="foo bar baz">

or

<meta content="foo bar baz" name="description">
Community
  • 1
  • 1
memowe
  • 2,656
  • 16
  • 25
1

You can use it like this:

/<meta[^>]*name=[\"|\']description[\"|\'][^>]*content=[\"]([^\"]*)[\"][^>]*>/i

works for compressed html code too.

  • You may want to account for the fact that either a double or single quote could follow `content=` so the regex would become `/]*name=[\"\']description[\"\'][^>]*content=[\"\'](.*?)[\"\'][^>]*>/i` – Mike Godin Mar 16 '15 at 14:30
1

Check this php function to get all meta details including description easily.

Web_Developer
  • 1,251
  • 2
  • 18
  • 34
0
<meta\s+(?=[^>]*name\s*=\s*("|')description\1)[^>]*content\s*=\s*("|')(.*?)\2[^>]*>
bavo
  • 469
  • 4
  • 5