I'm trying to get some data into a postgreSQL database using Psycopg2. The function I'm using to load the database is as follows:
def load_db():
data = clean_data()
conn = psycopg2.connect(database='database', user='user')
cur = conn.cursor()
for d in data:
publisher_id = (d[5]['publisher_id'])
publisher = (d[4]['publisher'])
cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
(publisher_id, publisher))
conn.commit()
cur.close()
conn.close()
But I am getting the error IndexError: tuple index out of range
and not really sure what I'm doing wrong. In the records I'm trying to input there are a lot of repeated publisher_id
's and publisher
's, hence the WHERE NOT EXISTS
. I'm pretty new to working with databases through python so I'm sure it's some thing simple. Thanks in advance!
UPDATE!
A sample of the data
is as follows:
[{'article_id': 7676933011},
{'web_id': u'world/2015/jul/03/iranian-foreign-minister-raises-prospect-of-joint-action-against-islamic-state'},
{'title': u'Iranian foreign minister raises prospect of joint action against Islamic State'},
{'pub_date': u'2015-07-03T21:30:51Z'},
{'publisher': 'The Guardian'},
{'publisher_id': '1'},
{'author': u'Julian Borger'},
{'author_id': u'15924'},
{'city_info': [{'city_name': u'Vienna',
'country_code': u'US',
'id': 4791160,
'lat': 38.90122,
'lon': -77.26526}]},
{'country_info': [{'country_code': u'IR',
'country_name': u'Islamic Republic of Iran',
'lat': 32.0,
'lon': 53.0},
{'country_code': u'US',
'country_name': u'United States',
'lat': 39.76,
'lon': -98.5}]},
{'org_info': [{'organization': u'Republican'},
{'organization': u'US Congress'},
{'organization': u'Palais Coburg Hotel'},
{'organization': u'Islamic State'},
{'organization': u'United'}]},
{'people_info': [{'people': u'Mohammad Javad Zarif'},
{'people': u'John Kerry'}]}]
The full traceback is:
Traceback (most recent call last):
File "/Users/Desktop/process_text/LoadDB.py", line 69, in <module>
load_db()
File "/Users/Desktop/process_text/LoadDB.py", line 50, in load_db
(publisher_id, publisher))
IndexError: tuple index out of range