0

Trying to execute this code for scraping the specific websites / rss feeds metioned here below keep getting :

Traceback (most recent call last):

File "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py", line 28, in transcripts = [url_to_transcript(u) for u in urls]

File "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py", line 28, in transcripts = [url_to_transcript(u) for u in urls]

File "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py", line 17, in url_to_transcript text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]

AttributeError: 'NoneType' object has no attribute 'find_all'

Please advise.

import requests
from bs4 import BeautifulSoup
import pickle

def url_to_transcript(url):

page = requests.get(url).text
soup = BeautifulSoup(page, "lxml")
text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]
print(url)
return text

URLs of transcripts in scope

urls = ['http://feeds.nos.nl/nosnieuwstech',
        'http://feeds.nos.nl/nosnieuwsalgemeen']

transcripts = [url_to_transcript(u) for u in urls]

1 Answers1

0

The html returned is not the same as you see on the page. You can use the following:

import requests
from bs4 import BeautifulSoup
 # import pickle

urls = ['http://feeds.nos.nl/nosnieuwstech','http://feeds.nos.nl/nosnieuwsalgemeen']

with requests.Session() as s:
    for url in urls:
        page = s.get(url).text
        soup = BeautifulSoup(page, "lxml")
        print(url)
        print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
        print('--'*100)
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thank you, this has worked well for this beautifully. – Sumus Vegann May 02 '20 at 08:57
  • # # Pickle files for later use # # Make a new directory to hold the text files # !mkdir transcripts # for i, c in enumerate(comedians): # with open("transcripts/" + c + ".txt", "wb") as file: # pickle.dump(transcripts[i], file) Can you help me with the next step, how do I pickle the selected text? – Sumus Vegann May 02 '20 at 10:07
  • Hiya, Please open a new question. Add your attempt and explain what isn't working and what you have tried. – QHarr May 02 '20 at 10:10
  • I have made a new question, maybe you could help me out with this next step . https://stackoverflow.com/questions/61558832/how-do-i-pickle-the-scrape-data-instead-of-printing-the-data – Sumus Vegann May 02 '20 at 11:48