0

I attempting to extract the all the span infomation contained in the 'pre' tag(ex. "span class="n">data") on 'https://www.kaggle.com/arthurtok/interactive-intro-to-dimensionality-reduction/notebook' using BeautifulSoup, but I can not get the information to show up. I keep getting "AttributeError: 'NoneType' object has no attribute 'contents'"

Here is the code I am currently using:

import urllib.request
from bs4 import BeautifulSoup


url = 'https://www.kaggle.com/arthurtok/interactive-intro-to-dimensionality-reduction/notebook'
urlRead = urllib.request.urlopen(url).read()
soup = BeautifulSoup(urlRead, 'lxml')
prePrint = soup.find("pre").contents[0]
print(prePrint)

Am I reading the webpage incorrectly in urlRead or am I unable to extract the information using BeautifulSoup?

Martin Gergov
  • 1,556
  • 4
  • 20
  • 29
Learner
  • 3
  • 1
  • 3
  • Opening the site's source I don't find any
    -Tag ?!?
    – Omni Dec 11 '17 at 21:16
  • I am able to see the
     tag when inspecting the page
    – Learner Dec 11 '17 at 21:18
  • Is it possible that the – Omni Dec 11 '17 at 21:29
  • I don't believe so, as I don't have a registered account on the site – Learner Dec 11 '17 at 21:30
  • https://i.stack.imgur.com/Q9sEx.jpg – Omni Dec 11 '17 at 21:31
  • `
    ` doesn't appear in `View Source`. It does appear in `Inspect`. The `
    ` tag isn't part of the page as downloaded, it is subsequently added by the Javascript on the page.
    – Robᵩ Dec 11 '17 at 21:31
  • 1
    I agree with Rob, see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/45259523#45259523 for 3 ways to scrape pages like this – Dan-Dev Dec 11 '17 at 21:53
  • Got it, thank you all! – Learner Dec 11 '17 at 21:55
  • I see three problems. The smallest: you have to use correct `USER-AGENT`. The biggest: probably script has to login to portal. Third: it keeps it in ` – furas Dec 11 '17 at 23:12
  • That was also very useful, thank you furas! – Learner Dec 12 '17 at 04:55

0 Answers0