beautifulsoup find function returns "-" when retrieving text

Question

I am trying to get the text value inside a span tag having an id attribute using beautifulsoup. But it returs no text, rather only a '-'.

I have tried scraping using the div tag with the class attribute and then navigating to the span tag using findChildren() function too, but it still returns a "-". Here is the html that I am trying to scrape from the website https://etherscan.io/tokens-nft.

<div class="row align-items-center">
<div class="col-md-4 mb-1 mb-md-0">Transfers:</div>
<div class="col-md-8"></div>
<span id="totaltxns">266,765</span><hr class="hr-space">
</div>

And here is my python code:

from urllib2 import Request,urlopen 
from bs4 import BeautifulSoup as soup
import array 

url = 'https://etherscan.io/tokens-nft'
response = Request(url, headers = {'User-Agent':'Mozilla/5.0'})
page_html = urlopen(response).read()
page_soup = soup (page_html,'html.parser')
count = 0
total_nfts = 2691 #Hard-coded value
supply = []
totalAddr = []
transCount = []
row = []
print('All non-fungible tokens in order of Transfers')

for nfts in page_soup.find_all("a", class_ ='text-primary'):
    link = nfts.get('href')
    new_url = "https://etherscan.io/"+link
    name = nfts.text
    print('NFT '+name)

    response2 = Request(new_url, headers = {'User-Agent':'Mozilla/5.0'})
    phtml = urlopen(response2).read()
    psoup = soup (phtml,'html.parser')

    #Get tags 
    tags = []
    #print('Tags')
    for allTags in psoup.find_all("a",class_ = 'u-label u-label--xs u-label--secondary'):
        tags.append(allTags.text.encode("ascii"))
    count+=1
    if(len(tags)!=0):
        print(tags)


    #Get total supply
    ts = psoup.find("span", class_ = "hash-tag text-truncate")   
    ts = ts.text
    #print(ts)

    #Get holders
    holders = psoup.find("div", {"id":"ContentPlaceHolder1_tr_tokenHolders"}) 
    holders = holders.findChildren()[1].findChildren()[1].text
    #print(holders)

    #Get transfers/transactions
    print(psoup.find("span", attrs={"id":"totaltxns"}).text)

print('Total number of NFTS '+str(count))

I have also tried:

transfers = psoup.find("span", attrs={"id":"totaltxns"})

but that doesn't work either.

The correct parsing should return 266,765.

it is a very long output since the print(phtml) is in a loop — Maham Zaidi, Nov 02 '19 at 12:55

Rithin Chalumuri · Accepted Answer · 2019-11-03T11:46:25.170

To find the element by id you can use soup.find(id='your_id').

Try this:

from bs4 import BeautifulSoup as bs

html = '''
<div class="row align-items-center">
<div class="col-md-4 mb-1 mb-md-0">Transfers:</div>
<div class="col-md-8"></div>
<span id="totaltxns">266,765</span><hr class="hr-space">
</div>
'''

soup = bs(html, 'html.parser')

print(soup.find(id='totaltxns').text)

Outputs:

266,765

If you look at the page source for the link you've mentioned, the value in totaltxns is -. That's why it's returning -.

The value might just be populated with some javascript code on the page.

UPDATE

urlopen().read() simply returns the initial page source received from the server without any further client-side changes.

You can achieve your desired output using Selenium + Chrome WebDriver. The idea is we let the javascript in page run and parse the final page source.

Try this:

from bs4 import BeautifulSoup as bs
from selenium.webdriver import Chrome # pip install selenium
from selenium.webdriver.chrome.options import Options

url='https://etherscan.io/token/0x629cdec6acc980ebeebea9e5003bcd44db9fc5ce'

#Make it headless i.e. run in backgroud without opening chrome window
chrome_options = Options()  
chrome_options.add_argument("--headless")

# use Chrome to get page with javascript generated content
with Chrome(executable_path="./chromedriver", options=chrome_options) as browser:
     browser.get(url)
     page_source = browser.page_source

#Parse the final page source
soup = bs(page_source, 'html.parser')

print(soup.find(id='totaltxns').text)

Outputs:

995,632

More info on setting up webdriver + example is in another StackOverflow question here.

Can you `print(phtml)` and update the output? I suspect you might not be getting the expected html markup. The code above works perfectly for the string. — Rithin Chalumuri, Nov 02 '19 at 12:28
I am retrieving a few other tags from the html of the website correctly as well. Only this is giving incorrect output. — Maham Zaidi, Nov 02 '19 at 12:35
The example in the question is not reproducible. Could you update it so it's https://stackoverflow.com/help/minimal-reproducible-example? — Rithin Chalumuri, Nov 02 '19 at 12:37
Just updated to add my complete code and the link of the website. The code basically is trying to scrape the list of all the tokens and it also scrapes the tags, holders, total supply and transfers of each token by visiting each hyperlinked token. AN example link is https://etherscan.io/token/0x629cdec6acc980ebeebea9e5003bcd44db9fc5ce — Maham Zaidi, Nov 02 '19 at 12:42
Any idea how to retrieve the span text because if you click inspect element , the span tag does have the value 385,547 for example. — Maham Zaidi, Nov 02 '19 at 13:12

beautifulsoup find function returns "-" when retrieving text

1 Answers1