1
<div class="socialMedia">
    <div id="divLinkedin" style="width:100px;height:0px;">
        <script src="//platform.linkedin.com/in.js" type="text/javascript"></script>
        <script data-counter="right" type="IN/Share"></script>
        <!-- Facebook share button Start -->
    </div>
    <div id="divFb" style="float: left;margin-left:100px;">
        <a expr:share_url="data:post.url" href="http://www.facebook.com/sharer.php" name="fb_share" type="button_count">Share</a>
        <script src="http://static.ak.fbcdn.net/connect.php/js/FB.Share" type="text/javascript"></script>
        <!-- Facebook share button End -->
   </div>
   <div id ="divTw" style="float: left;margin-left:10px;">
       <a class="twitter-share-button" data-lang="en" href="https://twitter.com/share">Tweet</a>
      <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="https://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
   </div>
   <br />
   <br />
</div>

I need to find the Regular expression to filter the content inside the div element class='socialMedia'.All content inside that must be empty string ? How can I do that?

James
  • 2,195
  • 1
  • 19
  • 22
Sampath
  • 63,341
  • 64
  • 307
  • 441

1 Answers1

2

You can't parse HTML with RegEx in a reliable fashion, detecting end tags correctly is a major issue this is a good SO post explaining why not to use regex "Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions."

Use HTML Agility Pack instead.

e.g.

HtmlDocument htmlDocument = new HtmlDocument();

htmlDocument.LoadHtml("http://www.YOURURL.com");

foreach (HtmlNode selectNode in htmlDocument.DocumentNode.SelectNodes("//div[@class='socialMedia']"))
{
    string divContents = selectNode.InnerText;
    // Do Stuff
}
Community
  • 1
  • 1
Paul Zahra
  • 9,522
  • 8
  • 54
  • 76
  • Yep,you're right.But I did everything so far by using regexp.So learning another framework right now for above thing will take long time.So Can't I have a temporally solution for that using regex ? – Sampath Feb 19 '14 at 12:06
  • Is the content of the socialMedia div dynamic? If it is then your majorly screwed. There is no real learning involved, the code I gave will do what you want. Install HMTL Agility Pack with nuget, it couldn't be much simpler. – Paul Zahra Feb 19 '14 at 12:08
  • Nope.It's a static content.Here I read the RSS content of the blogger page from inside the mvc app. – Sampath Feb 19 '14 at 12:09
  • Seriously, Don't Do It!, The regex for your html would be a complete mess and very flaky. Have a look at the regex in an answer here... http://stackoverflow.com/questions/11017583/extract-content-from-div-class-div-tag-c-sharp-regex – Paul Zahra Feb 19 '14 at 12:21