0

I have a flask app with a endpoint that uses selenium and chrome driver.I use this code to scrape webpages of a angular website,create a json and serve it to the client.

Earlier,this site was easily scraped by using BeautifulSoup and I used to store the json in datastore along with current time of the put operation.This was done so that I don't scrape the website with every client request and a function made sure that it has been good 5 hours before the website is scraped again.

But now I have to use selenium and I cant think how selenium can open browser on the server(as it needs on my local machine to do anything).I also researched about headless state of chrome but currently it works only for node.js server.

The only option I am seeing right now is to scrape the site on my local machine and upload json to the gae datastore every time new data is produced(which is precise day of the month) on the website. Is there a way to completely automate the process?

code_tinkerer
  • 69
  • 1
  • 10

1 Answers1

0

I was able to get headless selenium running on GAE python, but I had to do it in App Engine Flex. See this answer:

Python Headless Browser for GAE

What do you mean by "it needs on my local machine to do anything"?

Alex
  • 5,141
  • 12
  • 26