-1

I have a folder with more than 150 txt files.

To combine the data

Resulting file has a size of 320 MB. I create a database and insert combined data:

with open("resall.TXT", "r") as f:
    result = f.readlines()
result = [x.split(",") for x in result]

import sqlite3 as sq3

con = sq3.connect("popular.db")

con.execute(
    """
    CREATE TABLE IF NOT EXISTS popname  (
        id INTEGER PRIMARY KEY,
        state TEXT,
        sex TEXT,
        year TEXT,
        forename TEXT,
        count INTEGER
    );
    """
)

for i, (state, sex, year, forename, count) in enumerate(result):
    con.execute(
        """
        INSERT INTO popname VALUES (?, ?, ?, ?, ?, ?);
        """,
        (i, state, sex, year, forename, count.strip()),
    )
    con.commit()

I could not create the database because it is too large. How to reduce size of the database?

  • 1
    "*I could not create the database...*" what error did you get? Or did it simply take too long? – Schwern Oct 31 '22 at 23:57
  • It take too long, since 8 hours and still not create –  Nov 01 '22 at 06:31
  • That would probably be because you're committing after every insert. – Schwern Nov 01 '22 at 07:02
  • But is says value error, expecting 5 got 2 –  Nov 01 '22 at 07:03
  • With your code I got the error message Value Error –  Nov 01 '22 at 07:03
  • *On what line?* I'm not much of a Python programmer, so there might be small errors. The point isn't to cut & paste the code, it's to use the techniques. Don't slurp in the whole huge file. Only commit every 1000 rows or so. – Schwern Nov 01 '22 at 17:15
  • 1
    I could solve it now, there is no more issue. Thanks a lot for your help –  Nov 01 '22 at 21:32

1 Answers1

1

The problem is not SQLite, it is your import script. SQLite can easily handle that much data, 320 MB is not that large.

The first problem is you're slurping all 320 MB into memory. Instead, iterate over the lines.

with open("resall.TXT", "r") as f:
    for line in f:
        rows = line.split(",")
        // then insert

When you do your insert, committing after every insert will make it dramatically slower. Every commit forces SQLite to write to disk which is slow.

Instead, commit every 1000 rows or so.

for i, (state, sex, year, forename, count) in enumerate(rows):
    con.execute(
        """
        INSERT INTO popname VALUES (?, ?, ?, ?, ?, ?);
        """,
        (i, state, sex, year, forename, count.strip()),
    )
    if i % 1000 == 0:
        con.commit()

However, there's a much faster way to do this. Use SQLite's own built in importer.

  1. Add a header line to your file, make sure they match your column names.
  2. Open your database in the SQLite shell.
  3. Create your table.
  4. Import the CSV file(s) into your table: .import /path/to/your/file.csv popname --csv
Schwern
  • 153,029
  • 25
  • 195
  • 336