-1
<p style="color: rgb(34, 34, 34); font-family: Arial, Verdana, sans-serif; font-size: 12px; line-height: normal;">My name is Faysal </p>

I want to parse only the String "My name is Faysal". I've written the following snippets,but it returns nothing. What should I need to modify?

 WebClient web = new WebClient();
        String html = web.DownloadString("http://www.dmp.gov.bd/application/index/pressdetails/press_159");


        MatchCollection m1 = Regex.Matches(html, "<p style=\"color: rgb(34, 34, 34); font-family: Arial, Verdana, sans-serif; font-size: 12px; line-height: normal;\">\\s*(.+?)\\s*</p>", RegexOptions.Singleline);


        foreach (Match m in m1) {
            String head = m.Groups[1].Value;

            Console.WriteLine(head);
        }
Steven Doggart
  • 43,358
  • 8
  • 68
  • 105
Faysal Ahmed
  • 1,592
  • 13
  • 25
  • 2
    [Wouldn't you prefer a nice HTML parser instead?](http://stackoverflow.com/a/1732454/102937) – Robert Harvey Dec 12 '13 at 21:06
  • 1
    @RobertHarvey Yeah, I was about to propose HTML Agility Pack http://htmlagilitypack.codeplex.com/ – Francis Ducharme Dec 12 '13 at 21:07
  • I know it can be done through HTML Agility Pack. but I want to make this code work anyhow. – Faysal Ahmed Dec 12 '13 at 21:17
  • 1
    Yeah, I know how you feel. Most people feel that cars are the best highway vehicles, but I wanted to be the first to traverse the Mass Pike with a yacht. (To be honest, I understand that feeling that "I'm 99% there anyway" - but as someone who's tried, trust me - HTML will always throw a formatting curveball at you.) – Katana314 Dec 12 '13 at 21:58

1 Answers1

0

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML.

retrieved from "RegEx match open tags"...

I hope you will learn just like I did a long time ago. You can NOT parse HTML using RegEx. It is more efficient to use a parser built for HTML.

  • If your page is in XML or XHTML, you can use the built-in parsing libraries.
    For example, System.Xml.XmlDocument.

  • If it is pure HTML, use HtmlAgilityPack, or another similar parser.

What I would do in your case, is select the first p element, that has the style attribute set to the "whatever".

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

No, please don't look down here!

.

.

.

.

.

.

.

Excuse me mods, if this answer is too long.

.

.

.

.

What you see below is UGLY, and NOT RECOMMENDED! I BEG OF YOU, DON'T LOOK!

.

.

.

.

.

.

.

.

"lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ "

.

.

.

.

.

.

.

.

.

If you absolutely have your heart set on using RegEx (kill me for saying this), then try the following expression.

<p style=\"color: rgb\(34, 34, 34\); font-family: Arial, Verdana, sans-serif; font-size: 12px; line-height: normal;\">\s*(.+?)\s*</p>

It's the same, except the parentheses around "rgb" are escaped. And I changed "\s" to "\s"

Edit

If it helps, I viewed the HTML from that website, and I could not find "My name is Faysal".

Community
  • 1
  • 1
Kayla
  • 485
  • 5
  • 14