Regex to find a specific tag in HTML

Question

I have a chunk of HTML I don't know anything about I am trying to find and replace a value of a certain attribute in a certain tag. So I want to capture the entire tag that has my matching attribute. There can be any number of unknown attributes in the tag as well. Say that I have the following HTML (which is all on one line) where xxxx="zzzz" represents any number of unknown attributes. I want to capture only the meta tag that has content="320"

My problem is that im getting every metatag before the one that contains the actual attribute im looking for.

<meta charset="UTF-8"><meta name="apple-mobile-web-app-capable" xxxx="zzzz" content="yes"><meta name="apple-mobile-web-app-status-bar-style" xxxx="zzzz" content="default"><meta name="mobile-web-app-capable" xxxx="zzzz" content="yes"><meta name="HandheldFriendly" xxxx="zzzz" content="True"><meta name="MobileOptimized" xxxx="zzzz" content="[320]"><meta name="format-detection" xxxx="zzzz" content="telephone=no"><meta name="title" xxxx="zzzz" content="Regex101">

Formatted version as a visual aid:

 <meta charset="UTF-8">
      <meta name="apple-mobile-web-app-capable" xxxx="zzzz" content="yes">
      <meta name="apple-mobile-web-app-status-bar-style" xxxx="zzzz" content="default">
      <meta name="mobile-web-app-capable" xxxx="zzzz" content="yes">
      <meta name="HandheldFriendly" xxxx="zzzz" content="True">
      <meta name="MobileOptimized" xxxx="zzzz" content="320">
      <meta name="format-detection" xxxx="zzzz" content="telephone=no">
      <meta name="title" xxxx="zzzz" content="Regex101">

This is the regex I am using <meta.*?\s+content\s*=\s*\"\[320]\".*?

example: http://refiddle.com/refiddles/5b10282175622d3008480000

Don't use Regex for HTML parsing, use a specialized library like [HTML Agility Pack](http://html-agility-pack.net/) — maccettura, May 31 '18 at 16:55
[RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/5233151#5233151) :-) — xanatos, May 31 '18 at 17:03
Thanks I just used HTML as an example, I'm not parsing HTML, I'm parsing strings that are formed similarly. — user1336827, May 31 '18 at 18:25

score 1 · Accepted Answer · answered May 31 '18 at 17:06

1

There's a small problem with greediness (.*) and the fact that square brackets are control characters in a RegEx.. you were almost there.. if I understand you correctly..

<meta[^>]*\scontent\s*=\s*"\[320\]"[^>]*>

Retrieves the one meta tag that a the literal content content="[320]" that you asked for.

answered May 31 '18 at 17:06

Rudu

15,682
4
47
63

Thanks this worked like a charm, I knew it was an issue with greedy/non-greedy! – user1336827 May 31 '18 at 18:26

score 0 · Answer 2 · answered May 31 '18 at 17:05

0

Give this a try. I tested it in Online Regex https://regex101.com.

<meta.+content="320">

answered May 31 '18 at 17:05

MicahB

134
7

Regex to find a specific tag in HTML

2 Answers2