0

i want to remove the google ad html in the text,such as

xxxxxxx<div class="gg200x300" style="padding: 19px; margin: 0px 22px 0px 0px; overflow: hidden; text-align: center; font-size: 0px; line-height: 0; float: left; border: 1px solid rgb(229, 229, 229); color: rgb(37, 37, 37); font-family: 宋体, sans-serif;"><iframe src="http://g.163.com/r?site=netease&amp;affiliate=news&amp;cat=article&amp;type=logo300x250&amp;location=13" width="300" height="250" frameborder="no" border="0" marginwidth="0" marginheight="0" scrolling="no"></iframe></div>yyyyyy

i want to remove the html between xxxxxxx and yyyyyy, and return

xxxxxxxyyyyyy 

how to set regular expression using c# and could you please describe why use the regular expression ? thanks.

qizweb
  • 69
  • 4

2 Answers2

0

If it is always in the div, you can do something like this.

if (a.IndexOf("<div") > 0)
{
   Console.WriteLine(a.Remove(a.IndexOf("<div"),a.IndexOf("</div>")-1)); 
   //output xxxxxxxyyyyyy
}

It is not the complete answer, but at least get you going. I'm not so good with Regex, but my hunch is that it'll tough to develop Regex for this string. Hope this will help.

EDIT

To make life easier wrap that div in another div. Like

<div id="googleadd">.......</div>

Then search based on that.

 if (a.IndexOf("<div id='googleadd'>") > 0)
 {
  :
  :
 }

Than you know exactly what you're deleting.

gmail user
  • 2,753
  • 4
  • 33
  • 42
0

If it is always the same class, it would be very easy to use @gmail user's method but changed to this:

if (a.IndexOf("<div") > 0)
{
    if (a.Substring(a.IndexOf("<div")).Contains("class=\"gg200x300\""))
    {
        Console.WriteLine(a.Remove(a.IndexOf("<div"),a.IndexOf("</div>")-1)); 
        //output xxxxxxxyyyyyy
    }
}

I would not use a regex for this as it will be overly complicated for what you are really looking for and might create false positives unless very specific. It is simple enough to look for a div of a certain class and remove that.

jamesthollowell
  • 1,550
  • 15
  • 21