1

Help out a newbie here. I am trying to check duplicate content from forum posts. So far I have downloaded the source with webclient and tried Regex as well as mshtml without any luck. I am getting the lines with mshtml but not in the way I wanted, means I am not being able to seperate the individual comments. The source I am trying to read goes below :

<p>
    Hey Alton!</p>
<p>
    I am facing this problem also but i have search on the internet for the solution. There are few things that we need to do to solve this problem.</p>
<p>
    First of all make sure that you have latest drivers for you Graphics Card.</p>

The Codes I have tried so far

Regex:

    Dim r As New System.Text.RegularExpressions.Regex("<p> .* </p>")
    Dim matches As MatchCollection = r.Matches(result)
    For Each itemcode As Match In matches
        ListBox1.Items.Add(itemcode.ToString)
    Next
  • I think you need to explain what you are trying to do a little better. Are you trying to compare content within `p` tags? – ianbarker Dec 14 '12 at 15:50
  • 3
    The [HTML Agility Pack](http://htmlagilitypack.codeplex.com) is a must for what you are trying to do – Oded Dec 14 '12 at 15:52
  • Do you know that the most upvoted answer on this site explains why is really a bad idea to use regex to parse html? [See here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Steve Dec 14 '12 at 16:05
  • @ ianbarker the source I have posted is a part of a comment which is inside a
    . It goes like this :

    ....

    ....

    ....

    I am trying to get the lines between

    in each division. There are 4 comments means, there are 4
    containing the data inside

    tag. I am trying to use the regex to get all the comments and store them in a list box. @ Oded I downloaded and tried to use HTML Agility Pack but couldn't find any helpful tutorials.

    – Shahriar Shamit Dec 14 '12 at 16:40
  • @ Steve I read the thread but no way out for me. – Shahriar Shamit Dec 14 '12 at 16:48
  • Post a link to the page, and explain what you want to do. I mean after you have receieved the content of all the p tags, then what? – Steve Dec 14 '12 at 21:50

1 Answers1

0
Dim regexObj As New Regex("<p>(.+?)</p>", RegexOptions.Singleline)
Dim matchResults As Match = regexObj.Match(subjectString)
While matchResults.Success

matchResults = matchResults.NextMatch()
End While
Ahmed KRAIEM
  • 10,267
  • 4
  • 30
  • 33