As an absolute newbie on the topic of using python, I stumbled over a few difficulties using the newspaper library extension. My goal is to use the newspaper extension on a regular basis to download all new articles of a German news website called "tagesschau" and all articles from CNN to build a data stack I can analyze in a few years. If I got it right I could use the following commands to download and scrape all articles into the python library.
import newspaper
from newspaper import news_pool
tagesschau_paper = newspaper.build('http://tagesschau.de')
cnn_paper = newspaper.build('http://cnn.com')
papers = [tagesschau_paper, cnn_paper]
news_pool.set(papers, threads_per_source=2) # (3*2) = 6 threads total
news_pool.join()`
If that's the right way to download all articles, so how I can extract and save those outside of python? Or saving those articles in python so that I can reuse them if I restart python again?
Thanks for your help.