2

I'm trying to import some data out of my MySQL database using Java, the query returns about 15mill rows and to prevent the ResultSet running out of memory I am streaming each results one by one.

private static void addEdges(DirectedGraph<String, DefaultEdge> g) throws SQLException {
Statement selectID = null;
String selectSQL = "SELECT citations.paperid, papers.id FROM papers INNER JOIN citations ON citations.title = papers.title AND citations.year = papers.year;";


try {
    selectID = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
    selectID.setFetchSize(Integer.MIN_VALUE);
    ResultSet rs = selectID.executeQuery(selectSQL);
    String from;
    String to;
    System.out.println("iterating");
    while(rs.next()) {
    from = rs.getString("paperid");
    to = rs.getString("id");
    g.addEdge(from, to);
    }
} catch (SQLException e) {
    e.printStackTrace();
} finally {
    if (selectID != null) {
    try {
        selectID.close();
        } catch (Exception e) {
            System.out.println("Error closing statement");
        }
    }
}
}

However I'm now getting this error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:2092)
    at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:406)
    at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:385)
    at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:163)
    at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7472)
    at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:919)
    at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2617)
    at com.mysql.jdbc.StatementImpl.close(StatementImpl.java:580)

This is being thrown from selectID.close() Which is odd as the ResultSet it should be closing should only contain one row. Tried allocating more memory but this issue still occurs, any ideas? (I'm using JGraphT library to create a directed graph)

Sadikhasan
  • 18,365
  • 21
  • 80
  • 122
BradStevenson
  • 1,974
  • 7
  • 26
  • 40
  • May be because of you are created an array inside the g instance. – Prakash Bhagat Apr 01 '14 at 07:38
  • http://www.coderanch.com/t/304547/JDBC/databases/Resultset-million-rows – codeMan Apr 01 '14 at 07:45
  • So is the ResultSet still caching the values even though I've set fetchSize() to use one row at a time? – BradStevenson Apr 01 '14 at 07:49
  • @BradStevenson it might. Could easily be a bug in the mysql library that you are using. You have the option of using *limit*. – Eugene Apr 01 '14 at 07:51
  • @BradStevenson try updating your mysql driver also.. – Eugene Apr 01 '14 at 07:57
  • Are you sure you want to set the fetch size to a huge negative number: selectID.setFetchSize(Integer.MIN_VALUE) ? Have you tried selectID.setFetchSize(1) or similar? – Geoff Williams Apr 01 '14 at 08:17
  • @Eugene The query already takes a huge amount of time to complete so using limit would have to be a last resort at the moment. I was using mysql-connector-java-5.1.29 I will try with it upgraded to 5.1.30 unless there are any preferable drivers? – BradStevenson Apr 01 '14 at 08:21
  • @GeoffWilliams I'm using MIN_VALUE because setFetchSize() wasn't honouring its value if I used another value. I think its a common issue and solution – BradStevenson Apr 01 '14 at 08:24
  • Thanks @BradStevenson - interesting. Looked it up and found this, don't know if it helps: http://stackoverflow.com/questions/2447324/streaming-large-result-sets-with-mysql – Geoff Williams Apr 01 '14 at 08:29
  • @GeoffWilliams Setting the fetchSize to `Integer.MIN_VALUE` is the only way to prevent the MySQL driver from loading all rows at once. – Mark Rotteveel Apr 01 '14 at 09:29
  • Are you sure the problem isn't simply that the graph datastructure you are creating in memory is getting too large? – Mark Rotteveel Apr 01 '14 at 09:31

0 Answers0