0

I have a wall of html code from a source and I need to extract '1929485' from the source

<input type="hidden" name="key" value="1929485" />

How would I do this? found this online:

var match = Regex.Match(source, @"class="""" onclick=""NewWindow\('([^']*)',\s*'([^']*)',.*");

Unsure what this all means and does?

Thanks.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
user2911924
  • 331
  • 3
  • 5
  • 15
  • 4
    You should use an HTML parser (pick one for your language, there are many awesome ones), and not regex, as tempting as it is. This is the classic answer to why: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454. With a long HTML file you risk head-banging bugs, as regex can't fully parse HTML. – Robin Jan 24 '14 at 12:52
  • Already doomed anyways, teach me your ways! – user2911924 Jan 24 '14 at 12:57
  • 1
    See HtmlAgilityPack. And, this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Dmitriy Khaykin Jan 24 '14 at 13:16
  • Can you specify what you need to do? Find the value of the value-attribute for all input elements? Or is it something else? – flindeberg Jan 24 '14 at 14:47

1 Answers1

0

First, use

pos = htmlstring.IndexOf("1929485")

to find the index where the substring is. Make sure there aren't any other instances or the first you get might not be the one you need.

Then, expand to the start and to the end until you reach your sweet spot, like this:

startpos = htmlstring.LastIndexOf("<input", pos);
endpos = htmlstring.IndexOf("/>", pos) + 2;

Then extract the whole thing:

htmltag = htmlstring.Substring(startpos, endpos - startpos);

I might be off by one character, just experiment a bit to fit your needs.

pid
  • 11,472
  • 6
  • 34
  • 63