Can't figure out how to get a pattern enclosed in HTML tag using a Regex

Question

I just started learning about Regexes and can't figure out how to lift Gizmo from the HTML tag

<meta content="Gizmo" property="og:title" />

I'm stuck at the (?<Name>meta content=), which is basically nothing, but I don't know what to do from there.

Federico Piazza · Accepted Answer · 2015-05-11T00:05:52.743

2

It's well known you shouldn't use regex to parse html (actually, it's been said millon times), you should use a html parser instead.

On the other hand, if you want to use regex for this... you are pretty close, you have to use:

(?<Name>meta content=".*?")

Btw, if you want to grab the word Gizmo you have to use capturing groups also withing your group Name

(?<Name>meta content="(.*?)")

On the other hand, if you don't care about capturing meta content and you just want to capture the content within content, you can use use:

content="(?<Name>.*?)"

edited May 11 '15 at 00:05

answered May 10 '15 at 23:52

Federico Piazza

Thanks! But it looks like it also extracts `meta content="` and `"` along with Gizmo? – exlo May 10 '15 at 23:54
@exlo look at the demo link, you can find the group index 2 having gizmo – Federico Piazza May 10 '15 at 23:55
I'm sorry, I'm new so I'm a little confused. If `Name Gizmo` is what I want, let's say using a simple demo like http://rubular.com/, then the match is `meta content="Colony Cafe"`. Is it possible to skip `"meta content="` and `"` all together? And it's for an assignment so I have to use regexes unfortunately – exlo May 10 '15 at 23:59
Got it! Thank you. But if I'm looking at a source where there are multiple meta content tags, and the identifier is `"og:title"`, would I make it `content="(?.*?)" property="og:title"` ? I tried it, and it's returning no matches – exlo May 11 '15 at 00:02
@exlo Yes, it will work only if `og:title` comes after content, otherwise it won't work. – Federico Piazza May 11 '15 at 00:04

1 Answers1