1

have the following text file:

<div>
    <asp:HyperLinkField HeaderText="Hello World of Regular Expression" />
</div>

How do i get what's between the double quotes for any word and spaces?

Updated:

//This one gets me close but doesn't get me strings with spaces in them
var match = Regex.Match(tokens[1], @"HeaderText=\""(\w+)\"""); 
//This was suggested below. It shows correct match count but values are just empty strings
var match = Regex.Match(tokens[1], @"HeaderText=""[^""]+""|[\w]+");

if (match.Success)
{
    yield return new KeyValuePair<string, string>(
        file, match.Groups[1].Value //This is empty for 2nd scenario
    );
}
Rod
  • 14,529
  • 31
  • 118
  • 230
  • 2
    Don't parse html with Regex: http://stackoverflow.com/a/1732454/426894 – asawyer Jan 09 '12 at 14:32
  • 1
    It's not really Html but ASP which will produce Html. He may want to parse it before it's generated. – Patrick Desjardins Jan 09 '12 at 14:33
  • @Daok: It makes no difference at all. HTML could also be parsed with an XML parser, which is what "the correct" approach is in these cases. – Jon Jan 09 '12 at 14:35
  • @Daok You got it. I am searching for maintenance purposes and not as part of application of sort – Rod Jan 09 '12 at 14:39
  • @rod Can you run the regex against the HeaderText property of the field directly in code behind? – asawyer Jan 09 '12 at 14:41
  • @Rod Gotcha. You'll want to use a proper Html parsing engine then for sure, not regular expressions. Here's a free easy to use library that should work great for this purpose: http://htmlagilitypack.codeplex.com/ – asawyer Jan 09 '12 at 14:54
  • @Jon It makes a difference as you can see rod answer after yours. Have a nice day. – Patrick Desjardins Jan 09 '12 at 15:12
  • sorry all, correction: all i'm doing is collecting all strings related to HeaderText and putting them in a generic collection. See this post for the actual data file: http://stackoverflow.com/questions/8779790/save-filename-and-headertext-value-as-key-value-pair-collection -- It's not a html file it just happens to have html syntax – Rod Jan 09 '12 at 15:15
  • @rod It would be helpful to include that sort of information upfront. I still think though, that you'll be better off with a proper html parser, using it to grab the HeaderText property text. – asawyer Jan 09 '12 at 15:32

1 Answers1

2

Try this one:

"[^"]+"|[\w]+

This will return a list of matches, of the individual words AND the entire expression between quotes.

Roy Dictus
  • 32,551
  • 8
  • 60
  • 76