If you just want the href
attribute of "Visit Website" button, then use this:
Company_URL = sel.xpath("//div[@id = 'tabs-1']/p[3]/a/@href").extract_first()
But, the above code will return you only this:
act_open_company_page.cfm?url_id=70098
Since the URL of the company (i.e. 'https://www.europlacer.com/') is NOT directly stored in the href
attribute. (It is resolved later using a javascript) But if you closely look at the source:
<a onclick="return trackOutboundLink('company_url','http://www.europlacer.com','49509');" href="act_open_company_page.cfm?url_id=70098" target="_blank" class=""><img src="/images/buttons/visit-website.jpg" alt="Visit EUROPLACER website" class=""></a>
You can see the direct URL is present as an argument to the function in onclick
attribute so you need to extract it out from there. First, to extract the onclick
attribute's value, do this:
URL = sel.xpath("//div[@id = 'tabs-1']/p[3]/a/@onclick").extract_first()
Then, extract your required URL from it like this:
URL = URL.split(",")[1]
URL = URL.strip("\'") // to remove the leading and trailing quotes
Another method to extract the URL would be to actually resolve the value of the href
attribute. You can see, when you click on the link, it becomes something like:
http://www.smtnet.com/company/act_open_company_page.cfm?url_id=70098
So, the trick would be to prepend the hostname ("http://www.smtnet.com"), load the URL and then extract the loaded URL once it changes. But the first method I described in my answer would be lot easier.
Additionally for the company name, I think you should try this:
Company_Name = sel.xpath('//header/h1/text()').extract_first()
Since, the above line prints only the company name (i.e. "EUROPLACER"). Your code takes in some text as well.