1

I'm new to coding and this my first project. So far I've pieced together what I have through Googling, Tutorials and Stack.

I'm trying to add data from a pandas df of scraped RSS feeds to a remote sql database. I've been using a combo of this post and this post to fix my error but the answers don't seem to work with my problem.

Does anyone have a potential solution to the error:


AttributeError Traceback (most recent call last) in () 19 cursor.execute(sql) 20 ---> 21 sql.to_sql(df, con=conn, name='rsstracker', if_exists='append', flavor='mysql') 22 23 # disconnect from server

AttributeError: 'str' object has no attribute 'to_sql'

import pandas as pd
from pandas.io import sql
import feedparser
import time

rawrss = ['http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
          'https://www.yahoo.com/news/rss/',
          'http://www.huffingtonpost.co.uk/feeds/index.xml',
          'http://feeds.feedburner.com/TechCrunch/',
         ]

posts = []
for url in rawrss:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init

import pymysql

# Open database connection
db = pymysql.connect(host="host", port=##, user="username", password="password", db="sql#######" )

# prepare a cursor object using cursor() method
cursor = db.cursor()


# Drop table if it already exist using execute() method.
cursor.execute("DROP TABLE IF EXISTS rsstracker")

# Create table as per requirement
sql = """CREATE TABLE rsstracker(
   article_title  varchar(255),
   article_url  varchar(1000),
   article_summary varchar(1000))"""

cursor.execute(sql)

sql.to_sql(df, con=conn, name='rsstracker', if_exists='append', flavor='mysql')

# disconnect from server
db.close()
Nick Duddy
  • 910
  • 6
  • 20
  • 36
  • Error seems clear as `to_sql` is a pandas dataframe method but you are running it on a sql string – Parfait Aug 30 '17 at 21:14

1 Answers1

1

The line should be :

df.to_sql(con=db, name='rsstracker', if_exists='append', flavor='mysql')

But you may need to use SQLAlchemy to make this export possible (doc)


You can try to use SQLAlchemy with pymysql like in the following lines :

import pymysql
from sqlalchemy import create_engine

engine = create_engine('mysql+pymysql://<username>:<password>@<host>[:<port>]/<dbname>')
engine.execute("DROP TABLE IF EXISTS rsstracker")
engine.execute("""CREATE TABLE rsstracker(
   article_title  varchar(255),
   article_url  varchar(1000),
   article_summary varchar(1000))""")


df.to_sql(con=engine, name='rsstracker', if_exists='append', , flavor='mysql')
PRMoureu
  • 12,817
  • 6
  • 38
  • 48
  • I've tried this and now get the error: 'ValueError: database flavor mysql is not supported' Why do you think I may need SQLAlchemy? Is there an issue with the way I'm approaching the problem? – Nick Duddy Aug 31 '17 at 19:33
  • it seems other flavors are not supported, so i guess you should switch to SQLAlchemy, or export the datas from dataframes with another way – PRMoureu Aug 31 '17 at 19:38
  • do I swap 'pymysql' with 'SQLAlchemy'? – Nick Duddy Sep 01 '17 at 10:18
  • @NickDuddy, could you try with the edited solution ? – PRMoureu Sep 01 '17 at 16:50
  • Just tried it, got the following error after I removed the : from the port as I kept getting int error. '"OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'sql8.freemysqlhosting.net[3306]' ([Errno 11001] getaddrinfo failed)")"' – Nick Duddy Sep 03 '17 at 17:27
  • 1
    Just changed '@[:]' to '@:'. Now getting a different error: 'UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 151: ordinal not in range(256)' – Nick Duddy Sep 03 '17 at 17:53
  • @NickDuddy i think you can resolve this by passing an option at the end of the `create_engine` : `engine = create_engine('mysql+pymysql://:@[:]/?charset=utf8')` – PRMoureu Sep 03 '17 at 18:04
  • check in this answer, maybe you need to specify encoding too : https://stackoverflow.com/a/33443380/6655211 – PRMoureu Sep 03 '17 at 18:06
  • that has solved that encoding problem. Now have an error that looks like this: 'InternalError: (pymysql.err.InternalError) (1054, "Unknown column 'index' in 'field list'") [SQL: 'INSERT INTO rsstracker (`index`, title, link, summary) VALUES (%(index)s, %(title)s, %(link)s, %(summary)s)'] [parameters:' it then goes on to display all the elements from the feed one after the other. – Nick Duddy Sep 03 '17 at 18:12
  • check the names you choose to create the table, maybe add an `index` field, then it was `article_title` instead of `title`, same for the next columns – PRMoureu Sep 03 '17 at 18:29
  • I've set a date index and changed the column headers. I got a different error relating to the content in the summery section. So i've added this as a variable 'summary = 'text''. And the content now goes into the data base. Thanks for you help with this. – Nick Duddy Sep 05 '17 at 15:36
  • @NickDuddy glad you achieve what you want, thanks for the feedback – PRMoureu Sep 05 '17 at 16:46