0

Very new to python. I am trying to get the data from an RSS feed, parse the data and then insert the data to a database. My short bit of code gets the correct items and I can print the results but I can only get the last item in the RSS feed to post to the database. I believe I may have defined "html" and "link" incorrectly. I want the item.title and item.link to populate a list that will then get entered into the database in the correct order. Any help much appreciated.

import sys
import requests
import urllib2
import feedparser
import psycopg2
import psycopg2.extras
from psycopg2.extras import execute_values
import time

url = "https://www.ferc.gov/xml/whats-new.xml"
response = urllib2.urlopen(url).read()


#saving the xml file
response = requests.get(url)
#with open('whats_hot.xml', 'wb') as file:
#    file.write(response.content)


d = feedparser.parse('https://www.ferc.gov/xml/whats-new.xml')


for item in d.entries:
    print "------"
    print item.published
    print item.title
    print item.link
    html = item.published + item.title
    link = item.link



con = psycopg2.connect(database="xx", 
user="xx", password="xx", host="127.0.0.1", 
port="5432")  
print("Database opened successfully")

cur = con.cursor()



#try:
psycopg2.extras.execute_values(cur,
"insert into ferc_hots (link,html) values %s",
      [(link,html)])
#except psycopg2.IntegrityError:
#     print 'Duplicate values found.  Insert was not successful'

con.commit()  
print("Records inserted successfully")  
con.close()
Gil Hamilton
  • 11,973
  • 28
  • 51
wiedhas
  • 57
  • 7
  • `link` and `html` get overridden every time you go through the loop. You only do the insertion after your loop terminates, so it will use the last values of `link` and `html` for the insertion. How many links are you typically inserting? Also, if you want to populate a list, why aren't you appending `html` and `link` to a list? As for the insertion itself, have a look at [this question](https://stackoverflow.com/questions/2271787/psycopg2-postgresql-python-fastest-way-to-bulk-insert). – Mihai Chelaru May 30 '19 at 16:51

3 Answers3

1

Your insert statement needs to be inside the for loop as well. Otherwise, you only insert the last record.

con = psycopg2.connect(database="xx", 
user="xx", password="xx", host="127.0.0.1", 
port="5432")  
print("Database opened successfully")

cur = con.cursor()

for item in d.entries:
    print "------"
    print item.published
    print item.title
    print item.link
    html = item.published + item.title
    link = item.link
    psycopg2.extras.execute_values(cur,"insert into ferc_hots (link,html) values %s",[(link,html)])

con.commit()  
print("Records inserted successfully")  
con.close()

Another option is to save a list of records, and insert them together at the end.

Anh Do Hoang
  • 156
  • 5
0

The argument list to execute_values needs to be a "sequence of sequences". Which is a list of lists (or list of tuples). In your case, you're only leaving the final item values in html and link and so only providing a single item .

You'll want something like this:

args = []
for item in d.entries:
    print "------"
    print item.published
    print item.title
    print item.link
    args.append([item.published + item.title, item.link])

Or, in one fell swoop:

args = [[item.published + item.title, item.link] for item in d.entries]

Then the insert is something like:

psycopg2.extras.execute_values(cur,
          "insert into ferc_hots (link,html) values %s",
          args)
Gil Hamilton
  • 11,973
  • 28
  • 51
0

html and link are just single string values. They keep changing through your loop, but by the time you go to insert, they just have the last value given. You need to keep a list of value tuples that will be passed to your insert. Note the double parentheses are because a tuple is being appended to the values list. A tuple is defined like (item, item), so you are appending (item, item) and not just item.

values = []
for item in d.entries:
    print "------"
    print item.published
    print item.title
    print item.link
    values.append((item.link, item.published + item.title))

...

psycopg2.extras.execute_values(cur,
    "insert into ferc_hots (link,html) values %s",
        values)
drxl
  • 364
  • 1
  • 7
  • That worked! I originally made two empty lists and then appended the items into them. I then tried to insert them into the database but had syntax problems. Does creating one list work because then the append adds two values separated by commas? So when the insert command lists the two table entries, the list populates the table with the corresponding values? – wiedhas May 30 '19 at 21:04
  • the execute_values function is looking for a list of tuples. The tuples need to have their items ordered like the insert. So Insert into table1 (colA,colB) values %s needs to have a list of tuples like (ColAValue, ColBValue) you can have as many values in the tuple, they just need to correspond to the insert statement. The psycopg2 module then handles making sure they end up in the correct values (ColAValue, ColBValue) format. – drxl May 31 '19 at 11:22