0

I have a chunk of HTML I don't know anything about I am trying to find and replace a value of a certain attribute in a certain tag. So I want to capture the entire tag that has my matching attribute. There can be any number of unknown attributes in the tag as well. Say that I have the following HTML (which is all on one line) where xxxx="zzzz" represents any number of unknown attributes. I want to capture only the meta tag that has content="320"

My problem is that im getting every metatag before the one that contains the actual attribute im looking for.

<meta charset="UTF-8"><meta name="apple-mobile-web-app-capable" xxxx="zzzz" content="yes"><meta name="apple-mobile-web-app-status-bar-style" xxxx="zzzz" content="default"><meta name="mobile-web-app-capable" xxxx="zzzz" content="yes"><meta name="HandheldFriendly" xxxx="zzzz" content="True"><meta name="MobileOptimized" xxxx="zzzz" content="[320]"><meta name="format-detection" xxxx="zzzz" content="telephone=no"><meta name="title" xxxx="zzzz" content="Regex101">

Formatted version as a visual aid:

 <meta charset="UTF-8">
      <meta name="apple-mobile-web-app-capable" xxxx="zzzz" content="yes">
      <meta name="apple-mobile-web-app-status-bar-style" xxxx="zzzz" content="default">
      <meta name="mobile-web-app-capable" xxxx="zzzz" content="yes">
      <meta name="HandheldFriendly" xxxx="zzzz" content="True">
      <meta name="MobileOptimized" xxxx="zzzz" content="320">
      <meta name="format-detection" xxxx="zzzz" content="telephone=no">
      <meta name="title" xxxx="zzzz" content="Regex101">

This is the regex I am using <meta.*?\s+content\s*=\s*\"\[320]\".*?

example: http://refiddle.com/refiddles/5b10282175622d3008480000

user1336827
  • 1,728
  • 2
  • 15
  • 30
  • 4
    Don't use Regex for HTML parsing, use a specialized library like [HTML Agility Pack](http://html-agility-pack.net/) – maccettura May 31 '18 at 16:55
  • [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/5233151#5233151) :-) – xanatos May 31 '18 at 17:03
  • Thanks I just used HTML as an example, I'm not parsing HTML, I'm parsing strings that are formed similarly. – user1336827 May 31 '18 at 18:25

2 Answers2

1

There's a small problem with greediness (.*) and the fact that square brackets are control characters in a RegEx.. you were almost there.. if I understand you correctly..

<meta[^>]*\scontent\s*=\s*"\[320\]"[^>]*>

Retrieves the one meta tag that a the literal content content="[320]" that you asked for.

Rudu
  • 15,682
  • 4
  • 47
  • 63
0

Give this a try. I tested it in Online Regex https://regex101.com.

<meta.+content="320">
MicahB
  • 134
  • 7