I want to create a dataframe of Substack posts from all the newsletter I subscribe to. But using feedparser
+ Substack's RSS feeds only seem to go back ~20 posts—even if a particular newsletter has hundreds of old posts.
Is there a way to use RSS to get all the old posts too? Or another method to get the same data I can using the RSS feed that doesn't involve scraping/beautifulSoup
?
import feedparser
import pandas as pd
rawrss = ['https://heathercoxrichardson.substack.com/feed', 'https://marcstein.substack.com/feed']
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary, post.summary_detail, post.content, post.published))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary', 'summary_detail', 'content', 'published'])
print(df)