1

I've been scraping (python) articles from a couple of news websites from my country succesfully, basically by parsing the main page, fetching the hrefs and accesing them to parse the articles. But I just hit a wall with https://www.clarin.com/. I am only getting a very limited amount of elements because of the infinite scrolling. I researched a lot but I couldn't find the right resource to overcome this, but of course it is more than likely that I am doing it wrong.

For what I see in the devtools the url request that loads more is a json file, but I don't know how to fetch it automatically in order to parse it. I would like to get some quick guidance on what to learn to do this. I hope I made some sense, this is my base code:

source = requests.get(https://www.clarin.com/) html = BeautifulSoup(source.text, "lxml")

This is an example request url I am seeing in chrome devtools.

https://www.clarin.com/ondemand/eyJtb2R1bGVDbGFzcyI6IkNMQUNsYXJpbkNvbnRhaW5lckJNTyIsImNvbnRhaW5lcklkIjoidjNfY29sZnVsbF9ob21lIiwibW9kdWxlSWQiOiJtb2RfMjAxOTYyMjQ4OTE0MDgzIiwiYm9hcmRJZCI6IjEiLCJib2FyZFZlcnNpb25JZCI6IjIwMjAwNDMwXzAwNjYiLCJuIjoiMiJ9.json

Carataco88
  • 19
  • 1
  • Does this answer your question? [scrape websites with infinite scrolling](https://stackoverflow.com/questions/12519074/scrape-websites-with-infinite-scrolling) – Caleb Stanford Apr 30 '20 at 17:29
  • that will help, I will use that info and build my way through with the selenium documentation. Thanks! – Carataco88 May 01 '20 at 07:25

0 Answers0