1

I'm trying to write a regex that will look for the width and height attributes in a string (which will always be an html iframe) and replace the values that it has.

What I have is a string where ### could be any value, and not necessarily always 3 digits.

string iFrame = <iframe width="###" height="###" src="http://www.youtube.com/embed/xxxxxx" frameborder="0" allowfullscreen></iframe>

I want to end up with set values for the width and height:

<iframe width="315" height="215" src="http://www.youtube.com/embed/xxxxxx" frameborder="0" allowfullscreen></iframe>

I tried this, but am not good with regular expressions:

iFrame = Regex.Replace(iFrame, "width=\".*\"", "width=\"315\"");
iFrame = Regex.Replace(iFrame, "height=\".*\"", "height=\"215\"");

which resulted in:

<iframe width="315" allowfullscreen></iframe>

which is not what I want. Can someone help me?

Ben
  • 1,023
  • 7
  • 18
  • 35

3 Answers3

9

Replace your patterns to this:

"width=\"([0-9]{1,4})\""

and

"height=\"([0-9]{1,4})\""

Basically, you were using . which performs a greedy-capture. Meaning it grabs as many characters as possible. The patterns above look for any number character [0-9] that repeats between 1 and 4 times {1,4}. Which is what you are really looking for.

Shai Cohen
  • 6,074
  • 4
  • 31
  • 54
  • thank you for the response that worked for me! and the explanation. regular expressions are on my list of things to learn... – Ben Nov 16 '11 at 00:26
  • 1
    Glad it helped. Also, do a quick google search for RegEx pattern testers. They are often little programs that allow you to test your regex instantly. Probably the best way to learn. – Shai Cohen Nov 16 '11 at 15:08
3

You are better off using the HTML Agility Pack to parse and query HTML. It handles HTML fragments well.

RegEx is not a good solution for parsing HTML, as this SO answer may convince you.

Community
  • 1
  • 1
Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • 1
    Is there any answer other than `RegEx is not a good solution for parsing HTML use HTML Agility Pack`. Even referred library is the same always. I should save this string as a reputation hunter :) – L.B Nov 15 '11 at 22:37
  • @L.B - RegEx is _not_ a good solution for arbitrary HTML. And the library is a very good library, which is why it is referred to so often. Is this a problem? – Oded Nov 15 '11 at 22:39
  • Isn't there any other library, or is it always valid even for a simply parsing. – L.B Nov 15 '11 at 22:41
  • 1
    @L.B - It is good for HTML fragments, badly formed HTML and is fast. If you know the incoming format (and that it will not change or vary much), by all means, use RegEx. – Oded Nov 15 '11 at 22:43
3

I agree that this isn't the best way to work with html. The problem with your example is the . in you regex which is matching all chars and spaces up to the last " in the string. Change it to the code below which only matches non-whitespace characters.

iFrame = Regex.Replace(iFrame, @"width=""[^\s]*""", "width=\"315\"");
iFrame = Regex.Replace(iFrame, @"height=""[^\s]*""", "height=\"215\"");
Richard
  • 86
  • 6