0

When I try to pickle the data I get a syntax error.

 File "C:\Users\Jeanne\Desktop\PYPDIT\untitled3.py", line 33
    !mkdir transcripts
    ^
SyntaxError: invalid syntax

import requests
from bs4 import BeautifulSoup
import pickle

urls = ['http://feeds.nos.nl/nosnieuwstech',
        'http://feeds.nos.nl/nosnieuwsalgemeen']

with requests.Session() as s:
    for url in urls:
        page = s.get(url).text
        soup = BeautifulSoup(page, "lxml")
        print(url)
        print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
        print('--'*100)

Now I can scrape the text my next step is to be able to save the transcript into a seperate file

Also I want to order the text by place, city of origin

Cities = ['Amsterdam', 'Eindhoven', 'Nijmegen', 'Rotterdam', 'Veenendaal']

# Pickle files for later use

!mkdir transcripts

 for i, c in enumerate(cities):
     with open("transcripts/" + c + ".txt", "wb") as file:
         pickle.dump(transcripts[i], file)
  • I am not sure about why you have !mkdir transcripts in pickle section. You could generate directories using the following example: https://stackoverflow.com/a/14364249/6241235 – QHarr May 02 '20 at 12:42
  • If we see as the make dir part as a next problem, then I am still do not understand how instead of printing the text how I can dump it as a pickled file. Can you give me some input on that? – Sumus Vegann May 02 '20 at 13:38
  • if you remove that line, are there still problems? – QHarr May 02 '20 at 14:48
  • Traceback (most recent call last): The first part of the script works fine after I remove the !mkdir part. so I get the text scraped in the output but then I get the following File "C:\Users\Jeanne\Desktop\PYPDIT\untitled3.py", line 26, in for i, c in enumerate(cities): NameError: name 'cities' is not defined And I also do not get the file pickled – Sumus Vegann May 02 '20 at 15:36
  • So the main problem now is to get the text into a pickle file istead of printing it. – Sumus Vegann May 02 '20 at 15:42
  • your code isn't printing. It is pickle dumping to file. – QHarr May 02 '20 at 16:02
  • That last part, the pickle dumping to file is not returning the the file. I seem to make a mistake there. – Sumus Vegann May 02 '20 at 16:34
  • what error messages are you getting and where and how have you stored the appropriate content in transcripts list? – QHarr May 02 '20 at 18:05
  • Traceback (most recent call last): File "C:\Users\Jeanne\Desktop\PYPDIT\maximscrape1.py", line 27, in for i, c in enumerate(comedians): NameError: name 'comedians' is not defined – Sumus Vegann May 02 '20 at 18:32
  • The text is not stored is stored in page and in soup. and it is printed by print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]]). yet how do I pickle it instead of printing – Sumus Vegann May 02 '20 at 18:34
  • you are trying to create city name based files with contents pulled by indexing into transcripts list. You need to tell us how the web-scraping populates the transcripts list. Your pickle code is not the problem per se. Also, comedians is not the same as cities. – QHarr May 02 '20 at 20:07
  • you are trying to create city name based files with contents pulled by indexing into transcripts list [correct] the cities not the comedians should be the keys in the table file which should be pickled – Sumus Vegann May 03 '20 at 11:09
  • the original exercise you can find here https://www.youtube.com/watch?time_continue=2167&v=xvqsFTUsOmc&feature=emb_logo – Sumus Vegann May 03 '20 at 11:33
  • hi, all the relevant code to reproduce the issue in line with [mcve] should be here in the question. – QHarr May 03 '20 at 12:25

0 Answers0