How come my BeautifulSoup object doesn't contain the text attribute that my webpage does?

Question

I am using python 2.7 and turning a requests.content into a BeautifulSoup object. When I use the developer tools to view a div, there is text within the div. When I use BeautifulSoup to return the div, there is not text in the div.

import pandas as pd
from bs4 import BeautifulSoup
from urllib2 import urlopen
from string import punctuation
from requests import get

roster = pd.read_csv(#file w/ player names)

name_corrections = {

    'matt-dellavedova' : 'matthew-dellavedova',
    'marcelinho-huertas' : 'marcelo-huertas',
    'derrick-jones' : 'derrick-jones-jr',
    'john-lucas' : 'john-lucas-iii',
    'james-mcadoo' : 'james-michael-mcadoo',
    'raulzinho-neto' : 'raul-neto',
    'otto-porter' : 'otto-porter-jr',
    'glenn-robinson' : 'glenn-robinson-iii',
    'domas-sabonis' : 'domantas-sabonis',
    'lou-williams' : 'louis-williams',
    'joe-young' : 'joseph-young',

   }

url = 'http://projects.fivethirtyeight.com/carmelo/'

def remove_punctuation(s):
   s = ''.join([i for i in s if i not in ".,'"])
   return s

def process_id(roster):

    roster['538id'] = roster['Player'].apply(remove_punctuation)

    roster['538id'] = roster['538id'].apply(lambda x: x.replace(" ",     "-"))

    roster['538id'] = roster['538id'].apply(lambda x: x.lower())

    roster['538id'].replace(name_corrections, inplace=True)


process_id(roster)

page = get(xurl)

soup = BeautifulSoup(page, 'lxml')

soup.findAll('div', class_='market-value')

this returns:

return from code

view from developer tools

score 2 · Answer 1 · edited May 23 '17 at 12:00

2

This is a common issue people have when scraping webpages.

So simply speaking, the data you want is NOT in that page(http://projects.fivethirtyeight.com/carmelo/) after all, that's why you got nothing from your code, and what you've seen in developer tools is the result of a fully rendered webpage, therefore all data is ready.

The actual data you want is from this url(http://projects.fivethirtyeight.com/carmelo/ben-simmons.json), which is requested when the original page is loading and rendering in a webbrowser.

This might be counter-intuitive, check out this answer to get a detailed explanation.

edited May 23 '17 at 12:00

Community

1
1

answered Dec 08 '16 at 04:43

Shane

4,875
12
49
87

That makes since. And that answer/link was a great read to help me understand what is happening on the front end. I also was finding that you can use Selenium as a webdriver to help solve some of those render issues. Thanks for the feedback and answer. – rconnol Dec 29 '16 at 04:53
Glad to know I helped : ) – Shane Dec 29 '16 at 08:44

How come my BeautifulSoup object doesn't contain the text attribute that my webpage does?

1 Answers1