0

I'm trying to get the value of an attribute, given the preceding text. Say I have this:

<meta content="firsttag" name="datatoget" />

I want to get the value of name (datatoget).
I know what's before the specific attribute (<meta ... name=")

How can I, using regular expressions, get this value?

I got content="firstag" name=".*", which will return what I wanted, but with the surrounding text. Basically I want the `.* part.

ceeee
  • 15
  • 5
  • 1
    Obligatory: [Have you tried using an HTML parser instead?](http://stackoverflow.com/a/1732454/953482) I'm sure many parsing libraries have lovely `getAttribute` methods which would work quite well. – Kevin Sep 19 '12 at 15:07
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454, are you pulling a whole html page? or just have this 1 string? if you have a full html document please look at a html parser, lxml, beautiful soup, – dm03514 Sep 19 '12 at 15:10
  • No, i'm not parsing complete documents, it's some text identical every time, and I only need one simple value so I figured using regex would be better. – ceeee Sep 19 '12 at 15:15
  • You are on the right track just use `()` to capture the value you want `content="firsttag" name="(.*?)"` – RanRag Sep 19 '12 at 15:21
  • Regex is probably fine in that case. As much as SO loves to link the answer that dm and I linked, you _can_ parse a string with regex if it's a small subset of HTML with no nested tags and other such Zalgo-summoning properties that have tripped up so many devs. – Kevin Sep 19 '12 at 15:24
  • Kevin, I'll promise to leave Cthulu alone ;) @RanRag That sorta worked, only it matches 2 things now: The part I want, but also the whole part (content... ") – ceeee Sep 19 '12 at 15:29
  • @ceeee: I tried it with the string you mentioned above and it only outputs `[datatoget]`. – RanRag Sep 19 '12 at 15:31
  • I have it working now. It worked indeed RanRag, I was testing it in some browser regex tester, and it turned out it was kinda bugged. In python it worked great! Thanks, maybe you can make an Answer so I can accept it? – ceeee Sep 19 '12 at 15:40

1 Answers1

0
(?<=name=")[a-zA-Z0-9]*

(?<=name=") makes sure that the text 'name="' is behind what we want to match without including it in your result.

[a-zA-Z0-9] limits the capture group so that the end quote isn't included in it.

Phillip Schmidt
  • 8,805
  • 3
  • 43
  • 67