0

I've been using the PHP Simple HTML DOM parser to do a web page scrape.

For example, using a known genetic variant code, eg "rs4343" I would obtain a product page from this link:

https://www.thermofisher.com/order/genome-database/searchResults?searchMode=keyword&productTypeSelect=genotyping&keyword=rs4343

to obtain a reagent product code I would use regex (eg '/C_[\S]+/'; ) to locate the item, in this case "C__11942562_20" within the HTML

but as a result of a change in the web page which now uses javascript the scrape no longer works.

I've tried using the cURL php command but this also failed for the same reason as the parser.

Using Firefox I identified the API https://www.thermofisher.com/order/genome-database/api/v2/search) and method (POST), the minimum viable header and the json parameter.

Presumably this will work with cURL but is there any easier way using php?

I understand that php would need to create an instance of emulating a javascript enabled browser in order to render the result.

I'm speculating here but is it possible to use the client side (browser) to do this in the background - perhaps in an invisible frame?

haz
  • 740
  • 1
  • 11
  • 20
  • @Nick, note _using PHP_ not JS, cheers – haz Mar 31 '19 at 07:32
  • So is OP in the first dupe (see the second answer for how to use PhantomJS with PHP), and the second dupe is there to offer some more packages (e.g. Selenium works well with PHP and good examples of how to use it [here](https://stackoverflow.com/questions/6590360/how-to-use-selenium-with-php)) – Nick Mar 31 '19 at 07:35
  • PhantomJS no longer supported, which is a concern and the Selenium framework is far too big an investment in deployment - perhaps there is no solution at present? – haz Mar 31 '19 at 07:43
  • I will reopen but it's likely someone else will close with the same dupes... – Nick Mar 31 '19 at 07:48
  • @ Nick I did a fairly exhaustive search.. but there's usually a novel idea lurking – haz Mar 31 '19 at 08:29
  • I don't understand why selenium is too big an investment in deployment. It would take me maybe half an hour tops. – pguardiario Mar 31 '19 at 23:49
  • @pguardiario firstly its a testing framework and secondly it requires a server install and a client install. i am aiming to scrape using php server side and optionally JS client side, as mentioned there does not appear to be a php-based solution – haz Apr 01 '19 at 05:11
  • Are you aware of the PHP selenium bindings from facebook? I'm still not sure what the big deal is. – pguardiario Apr 01 '19 at 08:35
  • It basically boils down to figuring out the cookie issue or doing it with a full-browser solution. Neither of these seems particularly difficult. – pguardiario Apr 01 '19 at 09:31

0 Answers0