I would like to scrape a website by just running code in a browser. In this case, the scraper has to run on a specific machine, and I cannot install any software on that machine. However, there is already a browser installed (recent version of Firefox), and I can configure the browser however I want.
What I would like is a javascript solution for scraping, contained in a webpage on site A, that can scrape site B. It seems like this would run into some CORS-type problems; I assume that part of the solution is to disable any cross-origin checks in the browser.
What I have tried so far: I looked up "web scraping in javascript", this brings up a lot of stuff intended to run in nodejs with cheerio for example this tutorial, and also stuff like pjscrape which requires PhantomJS. However, I couldn't find anything equivalent that is intended to run in a browser.
P.S. This is interesting: Firefox setting to enable cross domain ajax request Apparently Chrome --disable-web-security
takes care of the cross-origin/cross-domain issues. Firefox equivalent?
P.S. Looks like ForceCORS extension to Firefox is also useful: http://www-jo.se/f.pfleger/forcecors I'm not sure if I'll be able to install that though.
P.S. This is a nice collection of ways to allow cross-domain in different browsers: http://romkey.com/2011/04/23/getting-around-same-origin-policy-in-web-browsers/ Sadly, the suggested Firefox solution doesn't work in versions >=5.