0

I have this code, it reads a csv file for the 2 first columns and appends a list from each row.

            with open(self.selected_file[0], 'rb') as csv_file:
                itemids = []
                csv_reader = csv.reader(csv_file, delimiter=',', quotechar="\"")
                for row in csv_reader:
                    itemids.append([row[0], row[1]])

I have a database that already contains 2 tables, each for every item respectively. I wish to check every row of the csv file (which are pairs of strings). If both strings are unique to their respective table (meaning row[0] is unique to the first item table in my db, and row[1] unique to my second item table), then add those values to their respective tables. I tried the following:

        for item in itemids:
            first_itemids = db_cursor.execute('''SELECT itemid FROM items_one''').fetchall()
            second_itemds = db_cursor.execute('''SELECT itemid from items_two''').fetchall()
            try:
                if not item[0] in first_itemids and not item[1] in second_itemids:
                    db_cursor.execute('''INSERT INTO items_one(itemid) VALUES (?)''', (item[0], ))
                    db_cursor.execute('''INSERT INTO items_two(itemid) VALUES (?)''', (item[1], ))
                    db_conn.commit()

However this check if not item[0] in first_itemids and not item[1] in second_itemids always evaluates to true, so duplicate non unique items are being added. I also tried the other way around if item[0] in first_itemids or item[1] in second_itemids: pass but that also failed

Note: These aren't my actual variable names, I don't know if same column names in different tables under the same DB can cause problems, but mines don't anyway - I just changed it for the sake of readability.

Edit:

I also tried checking each csv row before appending it to my item list like so:

            with open(self.selected_file[0], 'rb') as csv_file:
                itemids = []
                csv_reader = csv.reader(csv_file, delimiter=',', quotechar="\"")
                first_itemids = db_cursor.execute('''SELECT itemid FROM items_one''').fetchall()
                second_itemids = db_cursor.execute('''SELECT itemid from items_two''').fetchall()
                for row in csv_reader:
                    if row[0] not in first_itemds and row[1] not in second_itemids:
                        itemids.append([row[0], row[1]])

And then just insert the lists' value to the DB. No good as well

  • Seems that you call the same SELECT queries for each item in the csv file. I would suggest doing the selects before the for loop. – swbandit Feb 27 '17 at 03:19
  • Is it possible that you have the item in one table but not the other? This could possibly cause duplicate entries. I would also make sure that the values are really exact. A blank space can make the values different. – swbandit Feb 27 '17 at 03:21
  • You're correct - I do call the same SELECT query for each item. This way you can check if there was a duplicate item in the csv file. As for your other question, I check using DB browser - I have both items when trying to run the code. – gal almighty Feb 27 '17 at 03:24

2 Answers2

0

You could use an "upsert" to put unique items in each table:

for item in itemids:
    db_cursor.execute('INSERT INTO items_one(itemid) VALUES (?) WHERE (SELECT changes()=0) AND NOT EXISTS (SELECT itemid FROM items_two WHERE itemid = ?)', (item[0], item[1]))
    db_cursor.execute('INSERT INTO items_two(itemid) VALUES (?) WHERE (SELECT changes()=0)', (item[1],))
    db_conn.commit()

But this will not fail if the other item is in the other table.

This is taken from another question on upsert in SQLite.

You should also be able to extend each query to check if the item is unique in the other table:

db_cursor.execute('''
    INSERT INTO items_one(itemid)
    VALUES (?)
    WHERE (SELECT changes()=0)
        AND NOT EXISTS (
            SELECT itemid FROM items_two
            WHERE itemid = ?
        )
''', (item[0], item[1]))
MattMS
  • 1,106
  • 1
  • 16
  • 32
0

I figured this out... I was comparing item[0] and item[1] which were strings to tuples of unicode, even though I thought python was capable of performing this check.

I changed my if clause to

if not (item[0],) in first_itemids and not (item[1],) in second_itemids: