0

I've been using Scrapy for getting content but I am having some trouble getting links from this particular site:Taleo Site

Clicking the title takes you to the job description. But the href is set to '#'.

And the onclick event is:

onclick="javascript:setEvent(event);requisition_openRequisitionDescription('requisitionListInterface','actOpenRequisitionDescription',_ftl_api.lstVal('requisitionListInterface', 'requisitionListInterface.listRequisition', 'requisitionListInterface.ID1380', this),_ftl_api.intVal('requisitionListInterface', 'requisitionListInterface.ID1384', this));return ftlUtil_followLink(this);

Also the link for all job descriptions is the same for all jobs. All of the description links are:

https://cantire.taleo.net/careersection/2/jobdetail.ftl

I've been using scrapy for a while and would like to following the link and scrape the content. I'm just having trouble with this sort of setup where the href attribute is '#' and the link is created by JavaScript.

In the past I would do the following to get links and follow them, but in this case this doesn't work.

item['link'] = sel.xpath('@href').extract()[0]

How can I fix this? Thanks

Jeann Pierre
  • 131
  • 3
  • 11

2 Answers2

0

The browser sends a POST request instead of GET when you click on the job title. That's why the link is same only the POST request parameters differ with each job listing.

When you check the network console, you can see a POST request being send on this link https://cantire.taleo.net/careersection/2/jobdetail.ftl with various formdata (key-value pairs) . You can send a POST request with all the parameters using the Request module which will take you to the job description page.

The POST request will be something like:

yield scrapy.Request(url="https://cantire.taleo.net/careersection/2/jobdetail.ftl", method="POST", formdata={'key':'value'}, callback=self.parse)
Rahul
  • 3,208
  • 8
  • 38
  • 68
-1

I highly recommend using Selenium + Scrapy for this. This way you can easily deal with both issues of clicking through and rendering dynamic content. Helpful links here.

Community
  • 1
  • 1
Benjamin James
  • 941
  • 1
  • 9
  • 24