1

Summary

I am trying to scrape the website Quizlet using Pyppeteer (The Python Port of Puppeteer), however, I am running into a problem trying to obtain the prop data from the components, while scraping as the site uses React. I looked at the prop data and it contains an attribute called linkTo that contains the link to each flashcard set which I am trying to obtain.

What I am looking for

I am looking for a way to obtain this prop data using Pyppeteer. Or understand where prop data is stored in the browser so that I can access it while web scraping. Unfortunately, Quizlet does not use <a> tags in their card components so I cannot retreive the links from there.

Below is an image of the React components I am looking to scrape.

The React Components I am trying to scrape

Below is the prop data associated with the components I am trying to scrape from React Dev Tools.

{
  "hasHoverState": true,
  "isActive": false,
  "linkTo": "https://quizlet.com/176362686/ns201-flash-cards/",
  "onClick": "ƒ onClick() {}",
  "size": "small",
  "children": "<div />"
}

What I’ve Tried

I tried going through the html data to find any prop related data however this was to no avail.

What I’m currently doing

I am currently having Pyppeteer click every element containing the attribute linkTo and using the url to from each page. This is quite slow and takes up to 30 seconds to obtain one set of flash cards.

Any help would be greatly appreciated!

Tosin Kuye
  • 11
  • 1
  • 2

0 Answers0