-1

I got the source code of a webpage and I want to get a word after the vi-buybox-watchcount">.

Please click here to see the image

After the vi-buybox-watchcount"> has 152 number. I want to extract it..

I know only split keyword to do it.But i cannot use '>' to split it because that source code has so many '>' with digits..

So i try to split it as following but it is getting errors...

for (int i = 0; i < Convert.ToInt32(idlist.Length); i++)
        {
            string url = "http://www.ebay.com/itm/" + idlist[i];
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            StreamReader sr = new StreamReader(response.GetResponseStream());
            // richTextBox2.Text += sr.ReadToEnd();
            string a = sr.ReadToEnd();
            sr.Close();
            string source = null;
            source = string.Join(Environment.NewLine,
           a.Split('vi-buybox-watchcount">') // this is getting errors
                .Where(m => m.All(char.IsDigit)));

Please suggest me a method to extract this number

  • Search for _vi-buybox-watchcount">_ to find the start of the number, then search for _<_ to find the end of the number. – PaulF Dec 20 '17 at 16:33
  • Basically this: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 True not only for regexes, but for home-made split-based parsers. –  Dec 20 '17 at 16:35
  • how to search this words.. I have no idea about it., i am very beginner for this. – Brian Thomas Dec 20 '17 at 16:36
  • If OP knows exactly what is being looked for then _"parsing HTML"_ is not relevant. – PaulF Dec 20 '17 at 16:36
  • @Ivan The OP never mentioned regex. – bornfromanegg Dec 20 '17 at 16:38
  • @bornfromanegg Same applies to home-made split-based parsers. –  Dec 20 '17 at 16:39
  • @BrianThomas: Use the [IndexOf(string)](https://msdn.microsoft.com/en-us/library/k8b1470s(v=vs.110).aspx) for the first search, the [IndexOf(string,int)](https://msdn.microsoft.com/en-us/library/7cct0x33(v=vs.110).aspx) for the secod search and the [Substring](https://msdn.microsoft.com/en-us/library/aka44szs(v=vs.110).aspx) to extract the number. – PaulF Dec 20 '17 at 16:39
  • ok Paul I will see what you have sent – Brian Thomas Dec 20 '17 at 16:42
  • Could CSSSelectors be the way to go here? http://simontimms.com/2014/02/24/parsing-html-in-c-using-css-selectors/ – bornfromanegg Dec 20 '17 at 16:43

2 Answers2

0

Something like this :

string strHTML = "..................<span class=\"'vi-buybox-watchcount\">152</span>";

string strFind = "'vi-buybox-watchcount\">";
int startIndex = strHTML.IndexOf(strFind) + strFind.Length;
int endIndex = strHTML.IndexOf("<", startIndex);
string reqValue = strHTML.Substring(startIndex, endIndex-startIndex);

IndexOf will find the start position of the string being looked for, so add the length of that string to find the start of the value. The difference between that & the second string will be the required length to extract.

You may want to add error checking code in the event either string is not found - IndexOf returns -1 if not found.

If there are multiple occurrences, then you could use a loop & the second version of IndexOf with the last found endIndex as the second parameter (initialised to zero).

A possible Linq only solution could be :

strHTML.Split(new string[]{strFind}, StringSplitOptions.RemoveEmptyEntries)
    .Where(x => char.IsDigit(x[0]))
    .Select(y => y.Substring(0,y.IndexOf("<")));

Or

strHTML.Split(new string[]{strFind}, StringSplitOptions.RemoveEmptyEntries)
    .Skip(1)
    .Select(y => y.Substring(0,y.IndexOf("<")))
    .Where(m => m.All(char.IsDigit));

if you want only numeric values.

PaulF
  • 6,673
  • 2
  • 18
  • 29
0

What about using a regular expression instead?

String html = ... // your html text
String number = String.Empty; // default value if not found

Match m = Regex.Match(html, @"<span class\=""vi-buybox-watchcount"">([0-9]+?)<\/span>");

if (m.Success)
    number = m.Groups[1].Value;
Tommaso Belluzzo
  • 23,232
  • 8
  • 74
  • 98