0

Am having an HTML document and from that want to fetch necessary information so have used HTML agility concept. Using the following code am getting all the necessary data.

  var web = new HtmlWeb();
    var doc = web.Load("http://www.talentsearchpeople.com/en/jobs/?page=joblisting&pubID=&formID=&start=0&count=8&module=&functionLevel1=&provinceNode=&countryNode=&keyword=");

    var nodes = doc.DocumentNode.SelectNodes("//a[@class='grijs'][@title]");

    foreach (var node in nodes)
    {
        HtmlAttribute att = node.Attributes["title"];
        title = att.Value;
        Response.Write("<br/>" + att.Value);

    }




    var Location = doc.DocumentNode.SelectNodes("//td[@width='80']");

    foreach (var node in Location)
    {
        if (node.InnerHtml.Contains("Location:"))
        {
            locationname = HttpUtility.HtmlDecode(node.NextSibling.NextSibling.InnerText.Trim());

            Response.Write("<br/>Location1=" + locationname);

        }
    }

Using the above code am getting following output:

** Lead Buyer South

Customer Service Order Management with native level of German

EMEA Customer Experience & Quality Internship

Service Desk Team Leader with Excellent Level of German and French

Sourcing & Procurement Consultant with native level of French

Jefe/a de ventas con alemán e inglés. Recien Titulados.

Jefe/a de ventas con alemán e inglés. Recien Titulados.

Jefe/a de ventas con alemán e inglés. Recien Titulados.

Location1=Almeria

Location1=Terrassa

Location1=United Kingdom, Manchester

Location1=Barcelona

Location1=Barcelona

Location1=A Coruña

Location1=Cataluña

Location1=Murcia **

Above code works correctly for fetching of the data. Problem is i want to insert above data in database and also want to display the data in correct format means first title of the property followed by its location **Lead Buyer South Location1=Almeria

Customer Service Order Management with native level of German Location1=Terrassa

EMEA Customer Experience & Quality Internship Location1=United Kingdom, Manchester

Service Desk Team Leader with Excellent Level of German and French Location1=Barcelona

Sourcing & Procurement Consultant with native level of French Location1=Barcelona

Jefe/a de ventas con alemán e inglés. Recien Titulados. Location1=A Coruña

Jefe/a de ventas con alemán e inglés. Recien Titulados. Location1=Cataluña

Jefe/a de ventas con alemán e inglés. Recien Titulados. Location1=Murcia**

Alternative Method by searching the table tag

 var web = new HtmlWeb();
    var doc = web.Load("http://www.talentsearchpeople.com/en/jobs/?page=joblisting&pubID=&formID=&start=0&count=8&module=&functionLevel1=&provinceNode=&countryNode=&keyword=");
    var mainNode = doc.DocumentNode.SelectNodes("//table[@class='border-jobs']/*");
    foreach (var mainNodes in mainNode)
    {
        string pathdet = mainNodes.XPath;
        var nodes = mainNodes.SelectSingleNode("//a[@class='grijs'][@title]");
        if (nodes != null)
        {
            HtmlAttribute att = nodes.Attributes["title"];
            title = att.Value;
            Response.Write("<br/>" + att.Value);
        }


        var Description = doc.DocumentNode.SelectSingleNode("//td[@colspan='2']");
        if (Description.InnerHtml.Contains("Description:"))
        {
            s = Description.InnerHtml;
            s = s.Replace("Description:", "");
            Response.Write("<br/>Description=" + s);
        }


        var Location = doc.DocumentNode.SelectSingleNode("//td[@width='80']");


        if (Location.InnerHtml.Contains("Location:"))
        {
            locationname = HttpUtility.HtmlDecode(Location.NextSibling.NextSibling.InnerText.Trim());

            Response.Write("<br/>Location1=" + locationname);

        }
    }

If i use the above code then i get following output:

Assistant Call Centre Manager with fluent level of Spanish and English

Description= We are recruiting an Assistant Call Center Manager for a multinational company based in Lisboa, Portugal. This person will be responsible for the team management. Experience in team management, mainly in contact center, environment is required.

Location1=Lisboa, Portugal

I get the above output 8 times as //table[@class='border-jobs']/* tag occurs 8 times in the document

how can i get correct output?

user2240189
  • 315
  • 3
  • 7
  • 20

2 Answers2

0

At a glance it looks like you may get away with just storing them both in arrays and then when outputting get one item from each array.

More robustly and more correctly you should refine your searches so that you find the html element that has both pieces of information in it (eg search for tables with class "border-jobs". This contains both the job title and location. You can then get the two pieces of data from that at the same time.

This technique is better because it will deal better with things like no location being specified and in general better reflects what you are doing so will be more easily understandable by the next person to come along.

Addition

To answer your additional issues this line:

var Description = doc.DocumentNode.SelectSingleNode("//td[@colspan='2']");

will search the whole document. To get it to search the right node and only contents of that node you need:

var Description = mainNodes.SelectSingleNode(".//td[@colspan='2']");

Note the change to object (that you are already aware of from comments) as well as the addition of the . in the XPath which tells it to start at the current node.

Also your title select will not find anything valid in that node so you will need to update the XPath. Changing it to .//a will work since it is the first anchor tag but this might be a bit brittle.

Chris
  • 27,210
  • 6
  • 71
  • 92
  • Yes have tried to do by searching for tables with class "border-jobs". I will edit the question so that can check the code – user2240189 Nov 14 '13 at 12:46
  • Please check ALternative Method – user2240189 Nov 14 '13 at 12:50
  • In your alternative method your description and location seem to be using `doc` as their source whereas you should be searching the table you found and stored in `mainNodes` (and your pluralisation is really confusing :) ). I can't see why your titles would repeat though. :( – Chris Nov 14 '13 at 13:04
  • Ya i tried in both the ways. Using mainNodes and doc. but still am getting the same output. Even same title is getting re[aeted 8 times – user2240189 Nov 15 '13 at 03:58
  • Well I can see that the problem is that in the line `var nodes = mainNodes.SelectSingleNode("//a[@class='grijs'][@title]");` it seems to be returning a node outside of mainNodes (confirmed by looking at the XPath property of each). This isn't what I'd expect personally and I'm not an expert in either HTMLAgilityPack or XPath so I can't really help more. It may be worth asking a more focussed question concentrating on just that aspect of the problem though since it sounds like the rest of your methodology is ok. Sorry I can't help more. – Chris Nov 15 '13 at 09:41
  • In fact http://stackoverflow.com/questions/15185404/html-agility-pack-selectsinglenode-giving-always-same-result-in-iteration has the answer to the question. It doesn't implicitly root from your current node it seems. – Chris Nov 15 '13 at 09:43
0

I got the answer. :) Since // returns the first td[@colspan='2'] on the entire page, not the one in the table. Using the XPath "." in front of the expression will select the current node so var Description = mainNodes.SelectSingleNode(".//tr//td//table//tr//td[@colspan='2']"); will select only the descendants of the mainNodes node .

user2240189
  • 315
  • 3
  • 7
  • 20