0

So, hello. Python begginer and fairly new here as well. I didn't want to do this actually but I can't seem to find any answer anywhere. So I simply (or so I thought) wanna scrape this site to get a random word. I can't seem to find an efficient way on what tags to use in order to filter the html code. Any suggestions or good resources would be really appreciated! In addition, here's my code:

url = 'https://www.randomlists.com/random-words?dup=false&qty=1'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
review_text = soup.find_all(class_='support')
print(review_text)
>> []

I keep changing the find_all() argument but I can't figure out the right one

  • 1
    The word is loaded using JS and you cannot scrape that using bs4. Try using Selenium instead: https://stackoverflow.com/q/49939123/8878627 – S P Sharan Jan 04 '22 at 15:20

3 Answers3

1

First of all, try to find some basic tutorials on youtube to learn how can scrap basic websites and how everything works overall. And after that, I recommend to you start studying this book and practice.

0

The random word is generated on the page with JavaScript. In another word, when your browser sends a request to server, and get the HTML (initial DOM), CSS and JavaScript files as response. Your browser will execute JavaScript, and insert element (random world) into HTML (now is modified DOM).

When you use requests.get(url) , you will get the HTML (initial DOM), and you cannot scrape the random word (because it does not exist) !

Therefore, in order to get the HTML (modified DOM), you have to scrape the page after the JavaScript is executed.

There are many solutions for this, please refer to this post.


PS. How to verify the random word is generated by JavaScript?

Disable JavaScript in your browser, and reload the page, you will not see the random word.

Johnny
  • 83
  • 7
  • 1
    Now it makes sense, that's why looking at the source code I couldn't find the block for generating the word as it was in the 'Inspect'... Thanks mate. Now that I have the other post, this is solved Yeah... And I just did disable JavaScript for a momment and word is now shown – caviarnetherite Jan 04 '22 at 16:27