0

I am trying to write an automated PHP script to scrape and extract all 'Job IDs' (3262, 3197, 3196 etc.) from URL https://nforlanwebdmz.phs.org/ltmprd/CandidateSelfService/controller.servlet?dataarea=ltmprd&context.session.key.HROrganization=90&context.session.key.JobBoard=EXTPHYS&context.session.key.noheader=true.

However, this does not seem to be straightforward because the required data is not directly visible in the source code of the webpage. I also tried inspecting 'Developer Tools->Network' of different browsers, however could not locate the source of the data.

Any help would be highly appreciated.

Thanks & Regards!

Developer Tools->Network

Sam
  • 63
  • 1
  • 7

2 Answers2

0

I take look at the Developer Tools>Network on Chrome, and found this API url: https://nforlanwebdmz.phs.org/ltmprd/soapExt/ldrest/JobPosting/JobPostingListWebServices_ListOperation?JobBoard=EXTPHYS&LocationOfJob=+&Category=+&WorkType=+&JobRequisition=+&Description_translation_=+&JobPosting=+&PostingStatus=2&PostingDateRange.Begin=+&PostingDateRange.End=+&JobRequisitionPriority=+&csk.IsoLocale=en&HROrganization=90&limit=-1&=1486230138234

The "Job ID" is the "JobRequisition" in the JSON data

Afif Zafri
  • 640
  • 1
  • 5
  • 11
  • Yes Afif @afifzafri, this indeed is the URL I was looking for. But can you please tell me, what could be the reason that I could not find the same URL in Developer Tools->Network on Chrome? Do you use any special filters in Developer Tools? Thanks! – Sam Feb 04 '17 at 18:13
  • I'm not sure why you did not get the url. Nope I don't use any special filters in the Developer Tools. All I did was, open the link you give, then open the developer tools>network, and choose/click on the "XHR" tab and reload the page. Then just wait until the page is fully loaded. Try do it multiple times, sometime the dev tools does not "catch" the url properly – Afif Zafri Feb 04 '17 at 18:14
  • Can you please tell me which version of Chrome are you using? I'm having Version 55.0.2883.87 m. Thanks! – Sam Feb 04 '17 at 18:20
  • Strange but I am neither getting to see this particular API URL in Chrome nor in Firefox [51.0.1 (32-bit)]. – Sam Feb 04 '17 at 18:27
  • Yes I'm also using the same Chrome version, Version 55.0.2883.87 m (64-bit) – Afif Zafri Feb 04 '17 at 18:29
  • Can you also please help me out finding the source of data on URL https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/? Regards! – Sam Feb 04 '17 at 18:35
  • I found out another way to get the url, so just open the page, open the dev tool> network. Then, you see on the website there is the "Search Jobs" button. Clicked it, and then the URL will show up :) maybe before this I accidently clicked it – Afif Zafri Feb 04 '17 at 18:36
  • Yes Afif, clicking on the 'Search Jobs' button brings that URL. Please let me know if you find something for the other URL https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/. Thanks! – Sam Feb 04 '17 at 18:43
  • Seems like I cannot find the API url for the new website you give. All I got is the initiator is from a file name "WorkdayApp", but I can't find it in the source. – Afif Zafri Feb 04 '17 at 18:51
  • I suggest you to try using library for scraping the content. You could try using the SimpleHTMLDom as suggested by Muhammad Athar. Before this I manage to scrape from a website that load ajax data using Selenium with Python language. – Afif Zafri Feb 04 '17 at 18:55
  • Even if I use some library such as SimpleHTMLDom, will I be able to scrape content whose source is unknown? How will I provide that invisible content as input to SimpleHTMLDom class? – Sam Feb 05 '17 at 02:21
  • try look at headless browser, that's the way that can scrape website that use ajax. read http://stackoverflow.com/questions/260540/how-do-you-scrape-ajax-pages – Afif Zafri Feb 05 '17 at 03:42
0

Try SimpleHTMLDom to scrape date from any dom object. Download Link from http://sourceforge.net/projects/simplehtmldom/