Cannot extract 'pre' tag from webpage

Asked Dec 11 '17 at 21:10

Active Feb 02 '20 at 11:35

Viewed 240 times

I attempting to extract the all the span infomation contained in the 'pre' tag(ex. "span class="n">data") on 'https://www.kaggle.com/arthurtok/interactive-intro-to-dimensionality-reduction/notebook' using BeautifulSoup, but I can not get the information to show up. I keep getting "AttributeError: 'NoneType' object has no attribute 'contents'"

Here is the code I am currently using:

import urllib.request
from bs4 import BeautifulSoup


url = 'https://www.kaggle.com/arthurtok/interactive-intro-to-dimensionality-reduction/notebook'
urlRead = urllib.request.urlopen(url).read()
soup = BeautifulSoup(urlRead, 'lxml')
prePrint = soup.find("pre").contents[0]
print(prePrint)

Am I reading the webpage incorrectly in urlRead or am I unable to extract the information using BeautifulSoup?

edited Feb 02 '20 at 11:35

Martin Gergov

1,556
4
20
29

asked Dec 11 '17 at 21:10

Learner

Opening the site's source I don't find any
```
-Tag ?!?
```
– Omni Dec 11 '17 at 21:16
I am able to see the
```
 tag when inspecting the page
```
– Learner Dec 11 '17 at 21:18
Is it possible that the – Omni Dec 11 '17 at 21:29
I don't believe so, as I don't have a registered account on the site – Learner Dec 11 '17 at 21:30
https://i.stack.imgur.com/Q9sEx.jpg – Omni Dec 11 '17 at 21:31

` doesn't appear in `View Source`. It does appear in `Inspect`. The `` tag isn't part of the page as downloaded, it is subsequently added by the Javascript on the page.

– Robᵩ Dec 11 '17 at 21:31

1

I agree with Rob, see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/45259523#45259523 for 3 ways to scrape pages like this – Dan-Dev Dec 11 '17 at 21:53
Got it, thank you all! – Learner Dec 11 '17 at 21:55
I see three problems. The smallest: you have to use correct `USER-AGENT`. The biggest: probably script has to login to portal. Third: it keeps it in `` which has own url https://www.kaggle.io/svf/1918919/3e1551ecf3d4fd95e467dab83112b9ec/__results__.html so server send it as separated file and you have to download it manually. – furas Dec 11 '17 at 23:12
That was also very useful, thank you furas! – Learner Dec 12 '17 at 04:55

Cannot extract 'pre' tag from webpage

0 Answers0

Linked