find div class by the element text inside it

Question

I am crawling a game website and I want to get the div object that contains a certain text. In this case I want to get the div class "GameItemWrap" that contains a href with the text "SANDBOX Ghost". There are many GameItemWrap classes in the whole code and I don't want to get the "SummonerName"class div because there are some other classes inside "GameItemWrap" that I need.

This is what I have tried :

duo_name='SANDBOX Ghost'    
gamelist=soup.find('div',"GameItemList")# "GameItemList" is a div that contains "GameItemWrap"
games=gamelist.find_all('GameItemWrap',{('a'):duo_name })

This is what the javascript i am crawling looks like :

<div class="GameItemWrap>
    #some other div classes that i will need in the future 
    <div class="SummonerName">                                                       
        <a href="//www.op.gg/summoner/userName=SANDBOX+Ghost" class="Link" target="_blank">SANDBOX Ghost</a>                                                 
    </div>
</div>

I am expecting 4 GameItemWraps that include the text "SANDBOX Ghost" but when I print

print(len(games))

the output is 0. This does not work. Also I do not want to check every single GameItemWraps class to check whether they contain "SANDBOX Ghost" or not Is this possible?

Possible duplicate of [Using BeautifulSoup to find a HTML tag that contains certain text](https://stackoverflow.com/questions/866000/using-beautifulsoup-to-find-a-html-tag-that-contains-certain-text) — Smart Manoj, May 21 '19 at 04:09

score 0 · Accepted Answer · answered May 21 '19 at 04:59

After fixing html shown,with bs4 4.7.1 I would expect you to be able to use :contains pseudo class

from bs4 import BeautifulSoup as bs

html ='''
<div class="GameItemWrap">
    #some other div classes that i will need in the future 
    <div class="SummonerName">                                                       
        <a href="//www.op.gg/summoner/userName=SANDBOX+Ghost" class="Link" target="_blank">SANDBOX Ghost</a>                                                 
    </div>
</div>
'''
duo_name = 'SANDBOX Ghost'
soup = bs(html, 'lxml') #'html.parser' if lxml not installed
items = soup.select('.GameItemWrap:contains("' + duo_name + '")')

Dhamodharan · Answer 2 · 2019-05-21T06:58:35.963

0

Hope Your target data present over a tag then try to use like below which will help you.

duo_name='SANDBOX Ghost'
games = soup.find_all('a',string=duo_name)

The complete code look like,

from bs4 import BeautifulSoup
import re
chunk = '''<div class="GameItemWrap">
    #some other div classes that i will need in the future
    <div class="SummonerName">
        <a href="//www.op.gg/summoner/userName=SANDBOX+Ghost" class="Link" target="_blank">SANDBOX Ghost</a>
    </div>
</div>'''
soup = BeautifulSoup(chunk,'html5lib')
game_data = {}
duo_name='SANDBOX Ghost'
for chunks in soup.find_all('div',{'class':'GameItemWrap'}):
    if chunks.find('a',string=duo_name):
        chunk_for_future = chunks
        a_tag = chunks.find('a',string=duo_name)
        game_data[a_tag.text] = a_tag['href']
print(game_data)

and your results will be(illustrated in dict),

{'SANDBOX Ghost': '//www.op.gg/summoner/userName=SANDBOX+Ghost'}

edited May 21 '19 at 06:58

answered May 21 '19 at 06:30

Dhamodharan

199
10

But finding tags will only get me that tag and not the bigger
tag that I need if I understood your answer correctly
– dusrud May 21 '19 at 06:49
Yes, i missed one line in your question. i have updated the answer. Hope it will solve the issue – Dhamodharan May 21 '19 at 06:59
Your answer iterates over every 'GameItemWrap'
class which I didn't want to do
– dusrud May 21 '19 at 08:11

find div class by the element text inside it

2 Answers2