0

So i'm trying to extract columns from a csv file, and input these into a mysql table.

However, I'm getting the following error which is targetting a 'title' (column 3,line1)

 'Using Adversarial Autoencoders for Multi-Modal Automatic Playlist Continuation, ' at line 1")

csv data

1038819,Discoverable,Using Adversarial Autoencoders for Multi-Modal Automatic Playlist Continuation,Conference Proceeding,"Vagliano, Iacopo; Galke, Lukas; Mai, Florian; Scherp, Ansgar",10.1145/3267471.3267476,
1037162,Discoverable,Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles of Documents (Forthcoming),Conference Proceeding,"Saleh, Ahmed; Beck, Tilman; Galke, Lukas; Scherp, Ansgar",,1893/28014

I think it has something to do with the fact the author column is using ',', and there is a whole mix of different spacing and symbols in others.

Python

import pymysql
import csv

csv_data= csv.reader(open('Book1.csv'))

conn=pymysql.connect("localhost","root", "root", "test")
cursor=conn.cursor()
print ("Done")
for row in csv_data:

    cursor.execute('INSERT INTO output (Output_ID, Status, Title, Type, Authors, DOI, Handle ) VALUES({}, {}, {}, {}, {}, {}, {})'.format(row[0], row[1], row[2], row[3], row[4], row[5], row[6]))
    conn.commit()

cursor.close()
John Doi2021
  • 103
  • 10

1 Answers1

0

Notice that when you compose the SQL using string formatting, the result lacks quotation marks around the values to be inserted:

In [215]: 'INSERT INTO output (Output_ID, Status, Title, Type, Authors, DOI, Handle ) 
           VALUES({}, {}, {}, {}, {}, {}, {})'.format('manchego', 'brie', 'gruyere', 'stilton', 'camembert', 'bleu', 'asiago')

Out[215]: 'INSERT INTO output (Output_ID, Status, Title, Type, Authors, DOI, Handle ) 
           VALUES(manchego, brie, gruyere, stilton, camembert, bleu, asiago)'

Proper SQL would require quotation marks

INSERT INTO output (Output_ID, Status, Title, Type, Authors, DOI, Handle )
VALUES("manchego", "brie", "gruyere", "stilton", "camembert", "bleu", "asiago")

Rather than adding the proper quotation yourself, the best practice is to use parametrized sql:

sql = '''INSERT INTO output (Output_ID, Status, Title, Type, Authors, DOI, Handle)
         VALUES(%s, %s, %s, %s, %s, %s, %s)'''
cursor.execute(sql, row)

Notice that cursor.execute(sql, row) can accept 2 arguments, sql and row. They are passed as separate arguments instead of as one string, sql % row. The cursor.execute function will handle the proper quoting for you. Not only is this simpler for you, it protects against SQL injection.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thankyou for the detailed explination. Makes complete sense, is there an easy way of combining that thinking with dealing with annoying symbols like '/' and '.', – John Doi2021 Feb 11 '19 at 22:45
  • Oh, and i'm getting "Warning: Data truncated for Column 'Title, DOI and Authors' – John Doi2021 Feb 11 '19 at 22:48
  • I'm not sure what's annoying about `'/'` and `'.'`. If you feel it is related to the current question, explain and we can pursue it here. If it is unrelated, please start a new question. The `Warning: Data truncated` message happens when the database table has a column of VARCHAR(N) data type which is too small to contain the string you are trying to insert. You'll need to alter the column's data type (either to a larger VARCHAR or to allow arbitrary text) to fix this. – unutbu Feb 11 '19 at 22:57