So my use-case is this - I need to parse 800,000 files in an S3 bucket, extract the data I need, and then read it into a SQLite database.
I tried looping over them and then individually writing into the DB each time, but this was taking too long (over 36 hours passed and I wasn't even 40% done).
So my next approach is to loop over the 800k files, extract the data I need, and then write the intended output as a .txt file. This is so I can then simply run this giant .txt file as SQL and read in the necessary rows.
This is the premise:
# First, I create a .txt file that houses the queries I want,
query = f"insert into table_abc values ({abc})"
with open("sql_queries.txt", "a") as file_object:
file_object.write(query+"\n")
# Then, I execute the queries within this .txt file to read them into my db.
import sqlite3
conn = sqlite3.connect("example.db")
c = conn.cursor()
with open("sql_queries.txt") as sql_file:
sql_as_string = sql_file.read()
c.executescript(sql_as_string)
The problem of course is the fact that there are always characters that end up not being escaped or properly sanitized. Is there any way to insert parameterized queries into my text file instead, so that when it's run, it doesn't run into problems like OperationalError: unrecognized token: "^"
.
I know why this is happening, because my .txt file houses SQL insert queries that have characters not properly escaped, but I'm unsure how to go about building a .txt file where the queries are properly sanitized, and there doesn't seem to be much info on this outside of using parametrized queries directly in write statements.