Using Beautiful Soup in a class that contains spaces

Question

I'm using Python with Beautiful Soup to scrap a list of 20 games from Steam (http://store.steampowered.com/tags/en-us/RPG/). But those games are separated not with div, but with an a tag instead. Therefore, I tried to do the following:

all_games=soup.find_all('a',{'class':'tab_item   app_impression_tracked'})

(Those blank spaces exist in Steam's HTML)

However, it returned an empty list instead of all a tags that contained a 'class' called tab_item app_impression_tracked

I'm not trying to scrap only the game's names, but also its price, discount... And I'm not interested in the link too. I just want to grab the a tag because it contains all the information that I need about the games separately.

Is there a solution?

Solution:

all_games = soup.find('div', {'id':'NewReleasesRows'}).find_all('a', {'class':'tab_item'})

Those spaces were problematic, the real name of the class is tab_item and not tab_item app_impression_tracked as I thought.

A similar question was answered here https://stackoverflow.com/a/1080472/6158987 — Songtham T., Feb 27 '18 at 22:47

score 1 · Answer 1 · answered Feb 28 '18 at 06:21

You can find needed items using css-rules in soup.select() method also. Next code select 20 items from page:

all_games = soup.select("a.tab_item[class*='app_impression_tracked']")

When classes in the tag attribute class are separated by spaces you can match them this way: "a.tab_item.app_impression_tracked". But this rule match a exact with these two classes and no others inside. It looks like 20 items of the list have few different classes. The *= in brackets means contains next string.

score 0 · Answer 2 · answered Oct 01 '20 at 13:14

I've faced same issue and managed to fix it by doing the

className = 'tab_item   app_impression_tracked'
all_games = soup.find_all('a', {'class': className.split() if '  ' in className else className})

So, in if class name there's more than two spaces, we split(by space, so className becomes ['tab_item', 'app_impression_tracked']) the className to get the element.

Using Beautiful Soup in a class that contains spaces

2 Answers2