0

How do I perform a bulk insert with only some of the columns getting data from a CSV file?

My code is currently like this (apologies for the bad pseudo code):

        with open("some_csv_file", "r") as csvFile:
            # load csv data
    
            for row in csvFile:
                column1_data = row[0]
                column2_data = row[1]
                column3_data = row[2]
        
                # how to bulk insert this?? All data is the same except data loaded from csv
                pyodbc.execute("INSERT INTO some_table(Code, column1, column2, column3, 
                some_other_column, some_other_column) VALUES(?, ?, ?, ?, ?, ?)", 'xy', column1_data, column2_data, column3_data, 'abc', 123)
        
                pyodbc.commit()    

          pyodbc.close()

I have seen other answer pointing towards pyodbc "executemany" but I'm struggling to figure out how to load the csv data for the particular columns that change

Thanks

Pdeuxa
  • 651
  • 7
  • 27
decprog
  • 167
  • 2
  • 5
  • 22

2 Answers2

0

One way is to make your SQL statement into one that inserts many rows. This is the format:

INSERT INTO some_table (Col1name, Col2name, Col4name)
VALUES
(Row1Col1, Row1Col2, Row1Col4),
(Row2Col1, Row2Col2, Row2Col4)

See this post: Inserting multiple rows in a single SQL query?

To do that within your current code structure:

#make SQL statement with insert many
sql = "INSERT INTO some_table (Col1name, Col2name, Col3name) "
sql += "VALUES " 
for row in CSVfile:
    sql += "(%s, %s, %s), " % (row[0], row[1], row[2])

# chop of the last comma from the sql insert statement
sql = sql[0:-2]

# insert like normal
pyodbc.execute(sql)
Ben Dickson
  • 306
  • 2
  • 7
0

For .executemany() your parameter values should be a list of tuples with each tuple representing a row. You could do something like this:

>>> fake_csv_file = [['col1_val1', 'col2_val1'], ['col1_val2', 'col2_val2']]
>>> param_data = []
>>> for row in fake_csv_file:
    param_data.append((row[0], row[1], 'gord', 'was', 'here'))
>>> from pprint import pprint
>>> pprint(param_data)
[('col1_val1', 'col2_val1', 'gord', 'was', 'here'),
 ('col1_val2', 'col2_val2', 'gord', 'was', 'here')]
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418