execute query on table that contains billions of record

Question

I want to fetch some record(it can be 50,100 or something else that is configured by user) from database without using limit clause because our application may be work on multiple database like mysql,oracle,mssql,db2....

i did following solution

package com.test;

import java.sql.Connection;
import java.sql.Statement;
import java.sql.ResultSet;
import java.sql.DriverManager;
import java.util.Date;

public class BatchRetrieveTest extends Object {
    private static final int FETCH_SIZE = 10;

    public BatchRetrieveTest() {
    }

    public static void main(String[] args) {
        BatchRetrieveTest batchRetrieveTest = new BatchRetrieveTest();
        batchRetrieveTest.test();
    }

    void test() {
        Connection conn = null;
        Statement stmt2 = null;
        Date start = null;
        Date end = null;
        int i = 0;
        try {
            conn = DriverManager.getConnection(
                    "jdbc:mysql://localhost:3306/test",
                    "root", "root");
            stmt2 = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,
                    ResultSet.CONCUR_READ_ONLY);
            conn.setAutoCommit(false);
            stmt2.setFetchSize(FETCH_SIZE);
            stmt2.setPoolable(true);
            start = new Date();
            System.out.println(new Date() + "second execute start"
                    + new Date().getTime());
            ResultSet rs2 = stmt2
                    .executeQuery("SELECT * FROM sample_final_attendance limit 1000");

            end = new Date();
            System.out.println(new Date() + "*************second execute end"
                    + (end.getTime() - start.getTime()));
            rs2.absolute(200000);
            i = 0;
            while (rs2.next()) {
                if (i++ > 100) {
                    break;
                }
            }
            rs2.close();
            stmt2.close();
            end = new Date();
            System.out.println(new Date() + "second read end"
                    + (end.getTime() - start.getTime()));
            conn.close();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                stmt2.close();
                conn.close();
            } catch (Exception e) {
            }
        }
    }
}

Here sample_final_attendance table contains 15 columns and 3.2 millions of record while executing this program it requires 2GB of memory and 47 seconds of execution time

here i wonder that if some table has billions of record then it fails to execute

also i used setFetchSize as suggested but problem is same

please suggest some solution

thanks in advance

@Scary Wombat thats correct but that wont be work in other database i want common solution that works for all database — Darshan Patel, Jul 29 '14 at 06:29
How does this code work? The `LIMIT` clause of the SQL statement says get me the first 1000 rows, and you then move to the 200,000th row of the result set. I would have thought you'd end up with `rs2.next()` returning `false`, and the `while` loop doing nothing. What am I missing? — DaveH, Jul 29 '14 at 07:39
i used ResultSet.TYPE_SCROLL_SENSITIVE so you can move forward to the record where you want to move — Darshan Patel, Jul 29 '14 at 08:28
But you have a resultset with a maximum of 1000 rows in it, and you move to row 200,000 - how's that going to work? — DaveH, Jul 29 '14 at 09:27
How? "SELECT * FROM sample_final_attendance limit 1000" - this will only ever return the first 1000 rows from the database - doesn't matter what you fetch size is - you'll only ever get 1000 rows. And if you move to the 200,000th row on the result set, you'll be placed after the end of the result set, and `rs.next()` will return `false`. — DaveH, Jul 29 '14 at 10:44

score 0 · Answer 1 · answered Jul 29 '14 at 06:53

Well the ASFAIK & understood, the problem is more related with the handling of data in polyglot storage. If you think, you need to resolve the same in all cases interdependent of the database type - the one common approach is to build a serving layer .

The serving layer can be a cache library or even a Map of Maps created by you. Do not attempt to query the database with large number of records at once, instead bring the data as batches, and store it as a pool of pojos. On demand of the user, you can serve the data from the serving layer.

You can make use of the memcache or hazlecast or many other cache libraries, which can be directly integrated with databases. I really don't know how complex is your situation. What I made is a suggestion. This makes up a data-grid, which can be populated with data from any databases in the background.

thanks for reply thats correct i need to create layer that serves data but i want solution for to limit number of records that is returned at once and i used `setFetchSize` on `statement` but executing that statement takes so much time and memory almost 2GB memory so is there any way to execute query within 2-3 mb ram and within 2-3 second — Darshan Patel, Jul 29 '14 at 06:58

score 0 · Answer 2 · answered Jul 29 '14 at 07:25

0

We have setMaxRow(int numOfRow) in Statement Object this will limit number of rows generated by the Statement Object and simply ignore the remaining.

Take the look at the doc.

answered Jul 29 '14 at 07:25

dilly

67
4

what doc? why don't you provide a link for OP? – macfij Jul 29 '14 at 07:29
@dilly what happen when i want to fetch last 100 records,and also its not a proper way – Darshan Patel Jul 29 '14 at 07:34
1

hi macfij, i mean java documentation http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setMaxRows(int) – dilly Jul 29 '14 at 08:09

execute query on table that contains billions of record

2 Answers2

Linked