0

I'm learning Python. To teach myself I've decided to try to build a tool which gathers RSS feeds and stores the output, title, URL and Summary in a database (I will later build a tool to access the data and scrape the pages)

So far, I have created a local version that gathers gathers content from a list of RSS feeds and puts it into a pandas dataframe.

What I'm trying to understand next is, what tools do I need to turn this local script into a script that runs every, for example, 30 mins and adds the new found data to the database.

Any direction would be helpful.

import feedparser
import pandas as pd

rawrss = [
    'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
    'https://www.yahoo.com/news/rss/',
    'http://www.huffingtonpost.co.uk/feeds/index.xml',
    'http://feeds.feedburner.com/TechCrunch/',
    ]

posts = []
for url in rawrss:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init

df
Nick Duddy
  • 910
  • 6
  • 20
  • 36
  • if you include the code for adding to the DB to this script, then you could add the script as a crone job on your os – omu_negru Aug 17 '17 at 12:32
  • This: https://stackoverflow.com/questions/22715086/scheduling-python-script-to-run-every-hour-accurately – Mekicha Aug 17 '17 at 12:33
  • @omu_negru would this be using the OS tools in Python. I've seen 'import OS' in scripts but never actually used it...or really knew what it did. – Nick Duddy Aug 17 '17 at 12:34
  • 1
    con means leveraging the os tools, assuming your os is linux. the python scritp itself has no need to hold the scheduling code – omu_negru Aug 17 '17 at 12:36

0 Answers0