I'm new to coding and this my first project. So far I've pieced together what I have through Googling, Tutorials and Stack.
I'm trying to add data from a pandas df of scraped RSS feeds to a remote sql database, then host the script on heroku or AWS and have the script running every hour.
Someone on here recommend that I use APScheduler as in this post.
I'm struggling though as there aren't any 'dummies' tutorials around APScheduler. This is what I've created so far.
I guess my question is does my script need to be in a function for APScheduler to trigger it or can it work another way.
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
@sched.scheduled_job('interval', minutes=1)
sched.configure()
sched.start()
import pandas as pd
from pandas.io import sql
import feedparser
import time
rawrss = ['http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
'https://www.uktech.news/feed'
]
time = time.strftime('%a %H:%M:%S')
summary = 'text'
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((time, post.title, post.link, summary))
df = pd.DataFrame(posts, columns=['article_time','article_title','article_url', 'article_summary']) # pass data to init
df.set_index(['article_time'], inplace=True)
import pymysql
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://<username>:<host>:3306/<database_name>?charset=utf8', encoding = 'utf-8')
engine.execute("INSERT INTO rsstracker VALUES('%s', '%s', '%s','%s')" % (time, post.title, post.link, summary))
df.to_sql(con=engine, name='rsstracker', if_exists='append') #, flavor='mysql'