-2

I want to copy one value from another website, which is in th and anchor tag. I want to store this value in my application, I'm able to get h3/h2 data but not <a> data.

below is the html/f12 of other website. from this i want to extract 1,945 which is in bottom anchor.

<tr class="zebra">
<th>Total Backlinks</th>
<td>
<span class="tooltip-from-element" data-tooltip-position="lefttop" data-target-position="rightmiddle" data-tooltip-id="tooltip_overview_total_backlinks" name="total_backlinks">
<a onclick="ClearInfoAndDataTable();" href="/xyz.com?target=www.homeocare.in">1,945</a>
</span>
</td>
</tr>

I'm using this code to get that data:

string url = txthack.Text.Trim();
        string strurl = "https://mywebsitedomains?target= " + url + ""; //&warning=0&profile=css2";

        StreamReader stream = objm.URLServerRequest(strurl);

        string myResponse = stream.ReadToEnd();

       MatchCollection AltTag = Regex.Matches(myResponse, "(<h3.*?>)(.*?)(</h3>)", RegexOptions.IgnoreCase | RegexOptions.Multiline);
         string s = AltTag[1].ToString();   
  • what do you wanna get? can you please write down exactly? value of 1,945? – Val Nolav Apr 22 '15 at 08:51
  • i want to get 1.945 value which is coming in other website i want to get that data .one anchor tag value has 1,945. i want to get all anchor tags from that i want to get my value –  Apr 22 '15 at 08:57
  • Use `@"(?s)(?<=).*?(?=)"`. Verbatim string literals are useful in C# when defining regex patterns. – Wiktor Stribiżew Apr 22 '15 at 09:01

2 Answers2

2

I believe Regex is not very suitable for web scraping or similar tasks, as Val Nolav suggested there are libraries particular useful for such scenarios, CsQuery and HtmlAgilityPack are two well known libraries in this section.

here is a little example of using CsQuery (syntax is compatible with jquery selector) to extract strings from h3, h2, th and a tags.

        var cq = CsQuery.CQ.Create(@"<tr class=""zebra"">
                <th>Total Backlinks</th>
                <td>
                <span class=""tooltip-from-element"" data-tooltip-position=""lefttop"" data-target-position=""rightmiddle"" data-tooltip-id=""tooltip_overview_total_backlinks"" name=""total_backlinks"">
                <a onclick=""ClearInfoAndDataTable();"" href=""/xyz.com?target=www.homeocare.in"">1,945</a>
                </span>
                </td>
                </tr>");

        var texts=cq["th,a,h3,h2"].Select(a => a.InnerText).ToList();

CsQuery can be obtained through Nuget command Install-Package CsQuery -Version 1.3.4.

user3473830
  • 7,165
  • 5
  • 36
  • 52
0

Try using this to capture value inside tag with RegEx, however, if you need scraping data out of websites in your project, I would recommend you to use HTMLAgilityPack.

  string a = "<a onclick=\"ClearInfoAndDataTable();\" href=\"/xyz.com?target=www.homeocare.in\">1,945</a>";
  System.Text.RegularExpressions.Regex r = new   System.Text.RegularExpressions.Regex("(<a[^>]*>)(.[^<]*)");
  string rs = "(<a[^>]*>)(.[^<]*)";

  System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(a, rs);
  Console.WriteLine(match.Groups[2].Value);
Val Nolav
  • 908
  • 8
  • 19
  • 44
  • this is not my requirement , i want o get the value "1,945" dynamically , what ever it will be there that should come. my aim to get that value by web scraping –  Apr 22 '15 at 09:34