Python BeautifulSoup resources

Question

So, hello. Python begginer and fairly new here as well. I didn't want to do this actually but I can't seem to find any answer anywhere. So I simply (or so I thought) wanna scrape this site to get a random word. I can't seem to find an efficient way on what tags to use in order to filter the html code. Any suggestions or good resources would be really appreciated! In addition, here's my code:

url = 'https://www.randomlists.com/random-words?dup=false&qty=1'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
review_text = soup.find_all(class_='support')
print(review_text)
>> []

I keep changing the find_all() argument but I can't figure out the right one

The word is loaded using JS and you cannot scrape that using bs4. Try using Selenium instead: https://stackoverflow.com/q/49939123/8878627 — S P Sharan, Jan 04 '22 at 15:20

score 1 · Answer 1 · answered Jan 04 '22 at 15:21

1

First of all, try to find some basic tutorials on youtube to learn how can scrap basic websites and how everything works overall. And after that, I recommend to you start studying this book and practice.

answered Jan 04 '22 at 15:21

Iman Hosseini Pour

369
3
14

1

You think I haven't? None really explains how to choose the tags, they just know because they have pre-coded the program! I;ll check the book tho, thanks – caviarnetherite Jan 04 '22 at 15:28
Consider using https://selectorgadget.com/ to choose the right tags! – S P Sharan Jan 04 '22 at 15:30

score 1 · Answer 2 · answered Jan 04 '22 at 15:21

Gonna drop a couple resources:

Course: https://automatetheboringstuff.com/chapter11/

YouTube: https://youtu.be/GjKQ6V_ViQE And https://youtu.be/HiOtQMcI5wg

Personal projects:

https://github.com/0sergio-hash/Meal-Plan-scraping-project/blob/a884078eb119ab7ac5d113a8bdab494caad3db05/Meal%20Plan%20Project.ipynb

And

https://github.com/0sergio-hash/Amazon-Web-Scraping-project/blob/main/Amazon%20Web%20Scraper%20Project.ipynb

Second project follows along with the second YouTube video with some slight modifications

score 0 · Accepted Answer · answered Jan 04 '22 at 16:15

The random word is generated on the page with JavaScript. In another word, when your browser sends a request to server, and get the HTML (initial DOM), CSS and JavaScript files as response. Your browser will execute JavaScript, and insert element (random world) into HTML (now is modified DOM).

When you use requests.get(url) , you will get the HTML (initial DOM), and you cannot scrape the random word (because it does not exist) !

Therefore, in order to get the HTML (modified DOM), you have to scrape the page after the JavaScript is executed.

There are many solutions for this, please refer to this post.

PS. How to verify the random word is generated by JavaScript?

Disable JavaScript in your browser, and reload the page, you will not see the random word.

Now it makes sense, that's why looking at the source code I couldn't find the block for generating the word as it was in the 'Inspect'... Thanks mate. Now that I have the other post, this is solved Yeah... And I just did disable JavaScript for a momment and word is now shown — caviarnetherite, Jan 04 '22 at 16:27

Python BeautifulSoup resources

3 Answers3