Most efficient way to store large amount of API data in an sqlite3 database using python

Question

I am accessing an HTTP REST API using requests. The API returns JSON which I would like to parse and then commit to an sqlite3 database. One JSON response might look roughly like the following:

{
    entry1: 
        {
            key1:value1, 
            key2:value2
        }, 
    entry2:
        {
            key2:value2
        }
}

Another may look like the following:

{
    entry3: 
        {
            key1:value1, 
            key2:value2,
            key3:value3
        }, 
}

I would like to save this to a table as follows:

+--------+--------+--------+--------+
| entry  | key1   | key2   | key3   |
+--------+--------+--------+--------+
| entry1 | value1 | value2 | NULL   |
+--------+--------+--------+--------+
| entry2 | NULL   | value2 | NULL   |
+--------+--------+--------+--------+
| entry3 | value1 | value2 | value3 |
+--------+--------+--------+--------+

In reality I am looping over around thousands of entries. My issue is as follows:

I do not know all the possible keys key1, key2, key3 ... key_n that can occur, as the API does not document this. It is also too costly for me to do one loop over the entire dataset just to figure out which columns could possibly occur as every request to the API is charged.

There is one obvious solution, namely to check whether the columns to all entries for the current request exist in my table, and adding columns to the table as needed.

My question is twofold:

Should I perform the above check the above in my python code in a pandas.DataFrame OR should I commit directly to my sqlite3 database using SQL commands, thereby bypassing the need to store the DataFrame in my RAM during runtime? Is it bad practice to add columns like this (i.e using an ALTER TABLE statement) in sqlite3?
Have I overlooked other (potentially better) ways to go about this?

Have you looked at the "JSON1" extension https://www.sqlite.org/json1.html — JonSG, Jun 01 '21 at 19:07
@JonSG thanks, but python is much more versatile when it comes to working with and manipulating JSON — user32882, Jun 01 '21 at 19:41

score 0 · Answer 1 · answered Jun 01 '21 at 19:18

SQlite3 does not allow the full alteration of tables. It allows tables to be renamed, and columns to be added and dropped (which is what it sounds like you need) but that is all. More info on that here.

However, with what you're importing it sounds like you need a non-relational database. Have you considered using something like MongoDB? It would allow you to throw any object into the database and call it back based on what data you need.

Assuming that's not an option, I believe your best bet would be to use python to query your SQlite3 table info and use the result from there to add a column if required either per-item (which would be costly) or for all items at once. You could do so for all items at once by sending one large cover-all query to your API, and then storing the return in a txt file which you then parse to adapt the table and store the data.

Adding columns is all I will be needing, so that should be enough. MongoDB sounds cool but I think it would be an overkill for this particular project. There is no large cover-all query. This is a commercial API and it is designed in such a way as to require that users make requests based on precisely what data (which "entry") they need... — user32882, Jun 01 '21 at 19:39
Then as written, a request followed by a column creation if required, is your best choice. — LaytonGB, Jun 01 '21 at 20:34

score 0 · Answer 2 · answered Jul 19 '21 at 08:28

Just store your API information in a temporary table features with three columns id, ftype and feature. After scrolling, create the final table representations based on SELECT DISTINCT ftype FROM features and do something like

cur.executemany("INSERT INTO representations(id,?) VALUES(?,?)",((ftype,ID,feat,) for ID,ftype,field in rows));

while scrolling features with another cursor to get rows.

WARNING: I did not test this.

As suggested by @LaytonGB you can also use a text file for this.

Most efficient way to store large amount of API data in an sqlite3 database using python

2 Answers2