0

I am want scraped information from website where available product file name & profile serial number.

How I am scraped product serial number if always coming new serial & below process show html code?

<pre> <td><b>product file number </b> 7269</td  </pre> 
<pre> <td><b>product file number </b> 7562</td> </pre> 
<pre> <td><b>product file number </b> 7502</td> </pre>

I am new windows form application area so Please provide me full code for good help. I am really happy if you help me.

  • [Bad idea.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) Use an HTML parser :) – Sam Sep 01 '15 at 23:05

1 Answers1

0

You can treat the data as XML

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication45
{
    class Program
    {
        static void Main(string[] args)
        {
            string input =
               "<pre> <td><b>product file number </b> 7269</td>  </pre>" +
               "<pre> <td><b>product file number </b> 7562</td> </pre>" +
               "<pre> <td><b>product file number </b> 7502</td> </pre>";

            //add root tag around data so you have only one root tag
            input = string.Format("<Root>{0}</Root>", input);

            XElement root = XElement.Parse(input);
            var products = root.Descendants("pre").Select(x => new {
                name = x.Descendants("b").FirstOrDefault().Value,
                number = int.Parse(x.Element("td").Nodes().Skip(1).Take(1).FirstOrDefault().ToString())
            }).ToList();


        }

    }

}
jdweng
  • 33,250
  • 2
  • 15
  • 20