1

I am crawling a game website and I want to get the div object that contains a certain text. In this case I want to get the div class "GameItemWrap" that contains a href with the text "SANDBOX Ghost". There are many GameItemWrap classes in the whole code and I don't want to get the "SummonerName"class div because there are some other classes inside "GameItemWrap" that I need.

This is what I have tried :

duo_name='SANDBOX Ghost'    
gamelist=soup.find('div',"GameItemList")# "GameItemList" is a div that contains "GameItemWrap"
games=gamelist.find_all('GameItemWrap',{('a'):duo_name })

This is what the javascript i am crawling looks like :

<div class="GameItemWrap>
    #some other div classes that i will need in the future 
    <div class="SummonerName">                                                       
        <a href="//www.op.gg/summoner/userName=SANDBOX+Ghost" class="Link" target="_blank">SANDBOX Ghost</a>                                                 
    </div>
</div>

I am expecting 4 GameItemWraps that include the text "SANDBOX Ghost" but when I print

print(len(games)) 

the output is 0. This does not work. Also I do not want to check every single GameItemWraps class to check whether they contain "SANDBOX Ghost" or not Is this possible?

QHarr
  • 83,427
  • 12
  • 54
  • 101
dusrud
  • 33
  • 4
  • Possible duplicate of [Using BeautifulSoup to find a HTML tag that contains certain text](https://stackoverflow.com/questions/866000/using-beautifulsoup-to-find-a-html-tag-that-contains-certain-text) – Smart Manoj May 21 '19 at 04:09

2 Answers2

0

After fixing html shown,with bs4 4.7.1 I would expect you to be able to use :contains pseudo class

from bs4 import BeautifulSoup as bs

html ='''
<div class="GameItemWrap">
    #some other div classes that i will need in the future 
    <div class="SummonerName">                                                       
        <a href="//www.op.gg/summoner/userName=SANDBOX+Ghost" class="Link" target="_blank">SANDBOX Ghost</a>                                                 
    </div>
</div>
'''
duo_name = 'SANDBOX Ghost'
soup = bs(html, 'lxml') #'html.parser' if lxml not installed
items = soup.select('.GameItemWrap:contains("' + duo_name + '")')
QHarr
  • 83,427
  • 12
  • 54
  • 101
0

Hope Your target data present over a tag then try to use like below which will help you.

duo_name='SANDBOX Ghost'
games = soup.find_all('a',string=duo_name)

The complete code look like,

from bs4 import BeautifulSoup
import re
chunk = '''<div class="GameItemWrap">
    #some other div classes that i will need in the future
    <div class="SummonerName">
        <a href="//www.op.gg/summoner/userName=SANDBOX+Ghost" class="Link" target="_blank">SANDBOX Ghost</a>
    </div>
</div>'''
soup = BeautifulSoup(chunk,'html5lib')
game_data = {}
duo_name='SANDBOX Ghost'
for chunks in soup.find_all('div',{'class':'GameItemWrap'}):
    if chunks.find('a',string=duo_name):
        chunk_for_future = chunks
        a_tag = chunks.find('a',string=duo_name)
        game_data[a_tag.text] = a_tag['href']
print(game_data)

and your results will be(illustrated in dict),

{'SANDBOX Ghost': '//www.op.gg/summoner/userName=SANDBOX+Ghost'}
Dhamodharan
  • 199
  • 10