0

I have this data:

<meta name="description" content="Access Kenya is Kenya's leading corporate Internet service provider and is a technology solutions provider in Kenya with IT and network solutions for your business.Welcome to the Yellow Network.Kenya's leading Corporate and Residential ISP" />;

I am using this Regular Expression:

<meta +name *=[\"']?description[\"']? *content=[\"']?([^<>'\"]+)[\"']?

To get webpage description All works fine but everything stalls everywhere there is an apostrophe.

How do I escape that?

Kaspar Lee
  • 5,446
  • 4
  • 31
  • 54
  • ... and if the attribute value is wrapped by single quotes, you will have same problem for matching a double quote, right? Take a look at [this answer](http://stackoverflow.com/a/1732454/3294262) – fusion3k Apr 09 '16 at 15:45
  • Furthermore, you consider the (not probable) option of a meta without quotes. [See what happen in this case](https://regex101.com/r/hQ1gB0/1). – fusion3k Apr 09 '16 at 15:50
  • @fusion3k i have a fallback plan for that.Thank you anyway – philip wanekeya Apr 09 '16 at 16:40

1 Answers1

0

Your regular expression consider these three options for a <meta> node:

<meta name="description" content="Some Content" />
<meta name='description' content='Some Content' />
<meta name=description content=Some Content />

The third option is not valid HTML, but all can happen, so... you are right.

The simple way is to modify your original regular expression closing tag and using the ? not-greedy operator:

<meta +name *=[\"']?description[\"']? *content=[\"']?(.*?)[\"']? */?>
                                                      └─┘       └───┘
          search zero-or-more characters except following       closing tag characters

regex101 demo

But — also in this case — what happen if you have this meta?

<meta content="Some Content" name="description" />

Your regular expression will fail.

To real match a HTML node, you must use a parser:

$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML( $yourHtmlString );
$xpath = new DOMXPath( $dom );

$description = $xpath->query( '//meta[@name="description"]/@content' );
echo $description->item(0)->nodeValue);

will output:

Some Content

Yes, it's 5 lines vs 1, but with this method you will match any <meta name="description"> (also if it contains a third, not valid attribute).


Community
  • 1
  • 1
fusion3k
  • 11,568
  • 4
  • 25
  • 47