I'm using the program below to insert values from very large .csv files (~2.5 million lines) into an SQLite DB. It starts very fast, but begins to slow over time, before eventually hanging indefinitely at around 900,000 lines. My hunch says that it's eating up memory somehow, but not quite a memory leak, since it never throws an OutOfMemoryException
or similar. To be clear, the program never fails, or crashes. It just gets slower until it stops progressing. All other processes on my laptop are also affected, and eventually it takes ~10 seconds to even register mouse movements.
I'm not very experienced with databases, so it could easily be something stupid I'm doing with how I execute the INSERT
statement. The most recent modification I made was to use PreparedStatement.addBatch()
and PreparedStatement.executeBatch()
and despite reading the documentation, I'm still not very clear on whether I'm using them correctly. I'm using sqlite-jdbc-3.7.2.jar if that makes a difference.
public class Database{
public static void main(String[] args){
Connection c = connect("db.db");
// createTable(c);
addCSVToDatabase(c, "test-10000.csv");
// print(c);
disconnect(c);
}
public static void createTable(Connection c) {
Statement stmt;
String sql = "CREATE TABLE results("
+ "ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "
+ "TITLE TEXT NOT NULL, "
+ "URL TEXT NOT NULL UNIQUE, "
+ "BEAN BLOB"
+ ");";
System.out.println("QUERY: " + sql);
try {
stmt = c.createStatement();
stmt.executeUpdate(sql);
} catch (SQLException e) { e.printStackTrace();}
}
public static void addCSVToDatabase(Connection c, String csvFile){
BufferedReader reader = null;
int x = 0;
DBEntryBean b;
String[] vals;
ByteArrayOutputStream baos = null;
ObjectOutputStream oos = null;
PreparedStatement pstmt = null;
String sql = "INSERT OR IGNORE INTO results("
+ "TITLE, "
+ "URL, "
+ "BEAN"
+ ") VALUES(?, ?, ?);";
try{
pstmt = c.prepareStatement(sql);
reader = new BufferedReader(new InputStreamReader(new FileInputStream(csvFile), "UTF-8"));
c.setAutoCommit(false);
for(String line; (line = reader.readLine()) != null;){
vals = line.split("\\|"); // Each line is of the form: "title|URL|...|...|..."
b = new DBEntryBean();
b.setTitle(vals[0]);
b.setURL(vals[1]);
pstmt.setString(Constants.DB_COL_TITLE, b.getTitle());
pstmt.setString(Constants.DB_COL_URL, b.getURL());
// Store the DBEntryBean in the table so I can retrieve it, rather than construct a new one every time I need it.
baos = new ByteArrayOutputStream();
oos = new ObjectOutputStream(baos);
oos.writeObject(b);
pstmt.setBytes(Constants.DB_COL_BEAN, baos.toByteArray());
pstmt.addBatch();
pstmt.executeBatch();
System.out.println("Line: " + x++);
}
} catch (Exception e){ e.printStackTrace();
} finally{
try{
if(pstmt != null){ pstmt.close(); }
c.setAutoCommit(true);
} catch (SQLException e) { e.printStackTrace(); }
}
}
private static Connection connect(String path) {
String url = "jdbc:sqlite:" + path;
Connection conn = null;
try {
Class.forName("org.sqlite.JDBC");
conn = DriverManager.getConnection(url);
} catch (Exception e) { e.printStackTrace(); }
return conn;
}
private static void disconnect(Connection c) {
try{ if(c != null){ c.close(); }
} catch(SQLException e){ e.printStackTrace(); }
}
private static void print(Connection c){
Statement stmt = null;
String sql = "SELECT * FROM results;";
ResultSet rs = null;
try {
stmt = c.createStatement();
rs = stmt.executeQuery(sql);
while(rs.next()){
System.out.println(rs.getString("TITLE"));
}
} catch(Exception e){ e.printStackTrace(); }
}
}