When I try to pickle the data I get a syntax error.
File "C:\Users\Jeanne\Desktop\PYPDIT\untitled3.py", line 33
!mkdir transcripts
^
SyntaxError: invalid syntax
import requests
from bs4 import BeautifulSoup
import pickle
urls = ['http://feeds.nos.nl/nosnieuwstech',
'http://feeds.nos.nl/nosnieuwsalgemeen']
with requests.Session() as s:
for url in urls:
page = s.get(url).text
soup = BeautifulSoup(page, "lxml")
print(url)
print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
print('--'*100)
Now I can scrape the text my next step is to be able to save the transcript into a seperate file
Also I want to order the text by place, city of origin
Cities = ['Amsterdam', 'Eindhoven', 'Nijmegen', 'Rotterdam', 'Veenendaal']
# Pickle files for later use
!mkdir transcripts
for i, c in enumerate(cities):
with open("transcripts/" + c + ".txt", "wb") as file:
pickle.dump(transcripts[i], file)