1

I have a database containing Page ojects with html content. A lot of the rows in the db contain this content

  <p style="float: left; margin-right: 20px; height: 300px;">
        <img src="...">More html ...
 </p>

So I created a super simple regex replace:

 foreach (var page in db.Pages)
                {
                    string pattern = @"<p style=""float: left; margin-right: 20px;"">(.*)</p>/ms";
                    if( Regex.Match(page.Content, pattern).Success)
                    {
                        page.Content = Regex.Replace(page.Content, pattern, "<div class=\"contentimage\" >$1</div>");
                    }
                }
//                db.SubmitChanges();

Altough when I run the regex in a regex testing tool, it works. but in c# code it doesn't. Can anyone help me out please.

If anyone know how to do an update with the regex replace in sql, that would be fine to.

Regex isn't my strongest point (altough a great shame). But it is on my list of things to learn asap ;)

Nealv
  • 6,856
  • 8
  • 58
  • 89
  • 6
    I hate to say it, but regex is **really** not the tool of choice for procesing html... – Marc Gravell Jul 26 '10 at 20:45
  • Come now Marc, have you never read a perl web script? Those guys make clear that regex is the tool of choice for everything! Unless you're one of those lamo microsoft developers who think code should be readable, and regex should have a standard non-language specific set of instructions.. – Jimmy Hoffa Jul 26 '10 at 20:58
  • 1
    For all those who think Regex should be used to process HTML I would recommend a good read: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 IMHO every time someone tags a question with both `regex` and `html` tags he should be quoted this answer. – Darin Dimitrov Jul 26 '10 at 21:08
  • @Darin +1 for referencing that question – NickAldwin Jul 27 '10 at 13:12

1 Answers1

3

Your problem is "/ms". You're trying to specify a couple of regex flags, but C# specifies flags differently than php/perl (your regex tester probably tests regexes aimed at those languages. I suggest Expresso (it's free) for working with .NET regexes). Change your pattern to this:

string pattern = @"<p style=""float: left; margin-right: 20px; height: 300px;"">(.*)</p>";

(also note that I added the "height" attribute in order to make it match -- was that just a typo?)

And your regex instantiation to this:

if( Regex.Match(page.Content, pattern,RegexOptions.Multiline | RegexOptions.Singleline).Success)

And it should work.

[EDIT] Oh, and fixing the replace method:

page.Content = Regex.Replace(page.Content, pattern, "<div class=\"contentimage\" >$1</div>", RegexOptions.Multiline | RegexOptions.Singleline);
NickAldwin
  • 11,584
  • 12
  • 52
  • 67
  • And I completely agree with Marc that unless your HTML is always going to be very similar to your example, Regex is not really the way to go. – NickAldwin Jul 26 '10 at 20:53
  • Thanks alot, worked like a charm. And @Marc Gravell: Regex was the right tool for this job. Try putting this in less then 10 lines with a html parser :D this works like a charm, ergo: regex 1 - htmlparser 0 ;) I wasn't a fan off regex miself, but more and more I am becoming one – Nealv Jul 26 '10 at 21:12
  • Well as long as the HTML is always going to be perfectly formed like this, it'll work OK. Any other case, though, and it'll break. – NickAldwin Jul 27 '10 at 13:06
  • It worked great, and offcourse I already figured the replace out ;) great help, thanks again – Nealv Jul 27 '10 at 13:25