1

Side note: I'm a total noob here, so I actually don't know if the page is a JavaScript page. When I inspect element, it shows HTML code, but when I hover my mouse over a page number, it shows javascript:void().

I was looking at this post, as well as a few others, on how to scrape multiple pages using Python requests and beautiful soup.

My situation is similar to the OP of the aforementioned post:

  • url does not change when I click on a new page
  • I'm able to scrape one page, but there are multiple pages (possibly thousands in my case)

But between my case and the OP's case, there are also a few differences:

  • In the website mentioned in the post, when you hover your mouse over "2" or "3" to get to another page, you see in the bottom left of your browser javascript:goToPage("2"); however, the page I'm looking at, when I click on any of the page links, it says javascript:void(0).
  • I also don't get a POST when I inspect the elements, so I'm not able to follow the solution provided.

Again, I'm able to scrape one page, but I don't how to scrape all the pages at once.

Community
  • 1
  • 1
TheRealFakeNews
  • 7,512
  • 16
  • 73
  • 114
  • Hi, can you please post the link of the site you're trying to scrape? Even if it's showing `javascript:void(0)`, we can try a couple of things before concluding the site totally needs to render JS to get data. – WGS Oct 05 '15 at 03:27
  • @TheLaughingMan Thanks for responding to my comment. I can't post the link b/c it's private, but I'll try to find something similar – TheRealFakeNews Oct 05 '15 at 04:27
  • By private, do you mean it's an internal link/page? – WGS Oct 05 '15 at 04:53
  • @TheLaughingMan By private, I mean you need a password and username to get into the page – TheRealFakeNews Oct 06 '15 at 01:02

1 Answers1

0

You can try to move from Python to a Javascript solution, and set up an environment to execute these Javascript. It the only real solution as far as anyone can obscure the Javascript code so you can't really scrape anything unless you execute it.

lilezek
  • 6,976
  • 1
  • 27
  • 45