How to Read Data from CSV row by row and store in database

Question

I have a csv file with 20 records that need to stored in a table inside a database. I am trying to read every row line by line but not sure how to pass it to a function which would check the connection with database and store in it. I have created a separate config file for connection object for database.

How should I read the csv line by line and pass every row data to function and store it and carry out the same operation for every other row of csv. My code is as follows:

# This variable stores the insert query to store data in database
query = """INSERT INTO product(product_id, product_name, description, product_value) 
values(%s, %s, %s, %s)"""

def create_product():
data = pd.read_csv('path/to/csv')
df = pd.DataFrame(data)
data_list = []

# How to Write This part?
# How will I pass an entire row in the function call and what to have in the argument like a 
# List or something
for row in df.iterrows():
    # print(row)
    input_data = ",".join(row)
    insert_data = output_data_to_DB(query, input_data, connect_db) # Calling Function
    data_list.append(insert_data)
    print(data_list)

# Called Function Here
def output_data_to_DB(insert_query, output_list, conn):
try:
    cur = conn.cursor()
    cur.execute(insert_query, output_list)
    print("row inserted with valueList : ", output_list)
    output_list.commit()
    cur.close()
    return ""
except Exception as e:
    connect_db.rollback()
    cur.close

I would appreciate any kind of help. I am not that familiar with python programs.

NamsanCode · Accepted Answer · 2022-07-14T01:54:33.210

1

Example: pandas

ref: https://www.listendata.com/2019/06/pandas-read-csv.html

import pandas as pd
# read csv
data = pd.read_csv("your-file.csv")
# read csv and skip the header
data = pd.read_csv("your-file.csv", header = 1)
# read csv, define col names
data = pd.read_csv("your-file.csv", skiprows=1, names=['product_id', 'product_name'])

for row in data.iterrows():
  print(row)
  # process row value as you want
  res = output_data_to_DB(query, res, connect_db)

Example: python CSV module (<- i recommend this)

csv library would be enough and simpler to pass every row data to function.

def create_product():
    data_list = []

    with open('your-file.csv', newline='') as csvfile:
        reader = csv.reader(csvfile)
        next(reader)  # discard header
        for row in reader:
            print(row)  # e.g. `['foo', 'bar']`
            insert_data = output_data_to_DB(query, row, connect_db)
            data_list.append(insert_data)
        print(data_list)

--

Edit

Primary key (auto-incremented column)

Some options to add an auto-incremented value to columns like id can be:

BULK INSERT with an empty id column
- BULK INSERT with identity (auto-increment) column
AUTOINCREMENT keyword
- https://www.sqlite.org/autoinc.html
- How to insert new row to database with AUTO_INCREMENT column without specifying column names?
manually adding values to the csv file or selected row from it (<-i won't recommend this)

edited Jul 14 '22 at 01:54

answered Jul 13 '22 at 09:13

NamsanCode

226
1
4

Can't I use pandas to do the same thing. Because if I need to make any manipulations then creating a dataframe would be nice I think. Also, when I write row in the function call, it shows Type list[str] doesn't have expected attribute commit. What does that signify? – Pranay Mahajan Jul 13 '22 at 09:39
I don't want to add the headings of the column. How can I read remove them from being read? And If I check my database on the mysql workbench, the data that I want doesn't get inserted – Pranay Mahajan Jul 13 '22 at 09:48
1

1. to skip the header, add `next(reader, None)` before `for` loop. 2. if csv has rows that you want to exclude, you'd need to process the csv either with `csv` library or pandas. 3. pandas can do the same thing as you used `pd.read_csv`. i recommended `csv` because it is enough for csv processing. @PranayMahajan – NamsanCode Jul 13 '22 at 10:15
if you need more help, you can share more about how you want to manipulate the data. – NamsanCode Jul 13 '22 at 10:16
Thanks a lot. It is working. One last thing, I want to know that, if I have a column "ID" which is auto-incremented in the database then should I have that column in the CSV file or not. And if not then, when I insert the data the ID is any random value and is not starting from the earlier ID+1. What should I do – Pranay Mahajan Jul 13 '22 at 10:45
@PranayMahajan if you have an auto-incremented column, you can make sql do the job (or you can manually add an `id` values to the cav file or to the iterated `row` but it wouldn't be clean). i assume that the auto-incremented column `id` is the primary key. if you do not explicitly give values for it, **an unsued integer** will be used which might seem like a random value. (ref: https://www.sqlite.org/autoinc.html) also, i just looked up and `BULK INSERT` with an empty id column can work as well. (https://stackoverflow.com/questions/10851065/bulk-insert-with-identity-auto-increment-column) – NamsanCode Jul 14 '22 at 01:11
1

thanks @ZachYoung for your reminder about accepting answers. i'm fairly new to stackoverflow (at least when it comes to actively participating) and it is very exciting to find my answer has been helpful. so please let me know how it goes @pranay! hope this helps – NamsanCode Jul 14 '22 at 01:15
@ZachYoung Is there a way to fetch the "product_id" that is auto incrementing in the database whenever a new record is added. And use that fetched id to be inserted in a new csv file which contains other records and finally putting the whole new csv file in the database like in a new table using the python script. It would be like the "id" is the primary key in one table and used as foreign key in another table. Same thing I am trying to achieve using python script and insert in database. By the way, thanks to the both of you Zach and Seoul Kim, It really helped me a lot. – Pranay Mahajan Jul 14 '22 at 06:21
@PranayMahajan so you wanna load data into more than one table with foreign keys. that question is more about sql than python. i think it can be done by table JOIN. if `product_id` should be used as fk as, say, `staff_product.id`, `staff` and `product` should be joined. – NamsanCode Jul 14 '22 at 09:12
the exact query will depend on what columns your csv files have. if the csv files have fk id, then u can create temporary tables to load data first and insert to another table by joining. like this: https://stackoverflow.com/questions/63883042/import-a-csv-with-foreignkeys – NamsanCode Jul 14 '22 at 09:14
if the csv files do not have columns explicitly telling which is which, other columns will have to be used to sorta 'guess' which product is the right foreign key. in that case, some exception handling might be needed in iteration to pass those rows with undiscoverable fk. – NamsanCode Jul 14 '22 at 09:16
@ZachYoung As told, that I should keep the product_id field as empty for it to be auto incremented in the database but when I keep it empty and run my python script, I get an error stating incorrect integer value for column product_id. How should I resolve this. Zach or Seoul Kim if you could help with this, I would be grateful – Pranay Mahajan Jul 18 '22 at 10:46

How to Read Data from CSV row by row and store in database

1 Answers1

Example: pandas

Example: python CSV module (<- i recommend this)

--

Edit

Primary key (auto-incremented column)