1

I'm trying to write 15 Mb worth of data to a RDS instance with a PostgreSQL database, however I observe that this is really slow... It took about 15+ minutes to completely write all the data into the instance. Anyone has experienced with writing a lot of data row by row to an RDS instance? Thank you!

# Assuming the table is already created
def handler(file_with_many_many_rows, con):
    cur = con.cursor()
    reader = csv.reader(f)

    for i, line in enumerate(reader):
        vals = ("val1", "val2", "val3")
        insert_query = "INSERT INTO table_test VALUES (%s, %s, %s);"
        cur.execute(insert_query, vals)

    con.commit()
Community
  • 1
  • 1
LTran
  • 131
  • 1
  • 7

1 Answers1

0

Problem

I'm almost sure that problem in number of queries you are trying to execute. By analyzing complexity of your approach I can assume that it is O(n) because you have 15 * 10^6 rows and execution time around 15 mins.

What to do?

Split all amount of data into blocks, and for each transaction insert multiple instances of data. You can start experiments with 1000 rows or create limitation depending on size of data, e.g. counting line size of each row exported from csv file to postgres db.

Summary

If you are using psycopg2 directly, try executemany method of cursor to insert multiple rows at once. Here is example of how to use executemany without spliting on blocks to insert.

Using pandas

Also you can do same thing using pandas, which is very good for managing timeseries data, but you can use it for your purpose in the way described in this asnwer importing-a-csv-file-into-a-sqlite3-database-table-using-python using pandas.read_csv and pandas.DataFrame.to_sql.

Community
  • 1
  • 1
outoftime
  • 715
  • 7
  • 21