0

I am getting a memory issue when I run the following query in my Java program with Progress DataDirect MongoDB JDBC driver with a large table in a database (40 million records):

String query = "SELECT * FROM (SELECT tablename.*, ROW_NUMBER() OVER() AS rowid FROM tablename)";

The query above works perfectly fine if I have a small table in the database.

If I just run "SELECT ROW_NUMBER() OVER() AS rowid FROM tablename", then the name of the column will become null. The AS seems not effect in the statement:

Column Name: null class java.lang.Integer

Exception in thread "main" java.lang.NullPointerException

If I just run "SELECT * FROM tablename", it is very fast to get the ResultSet object back and I can easily read the 40 million records in less than few minutes.

So what am I missing? I don't know how to use the ROW_NUMBER function correctly. Any idea? Thanks.

Community
  • 1
  • 1
Cao Felix
  • 329
  • 2
  • 6
  • 24
  • What is the objective after reading 40 million records ? – Vasu Nov 11 '16 at 21:40
  • The requirement is for my library to pass the ResultSet object with the rowid added to another application so they can do whatever they want with the records. – Cao Felix Nov 11 '16 at 23:01
  • And the name should be "rowid" because the library also support other different databases. In stead of using ResultSet.getInt(index) to get the number, we would like to call it by name ResultSet.getString("rowid"). I guess I could force to either put it in the beginning or the end, but I thought that was not a good idea. – Cao Felix Nov 11 '16 at 23:06
  • Did you read my answer, why can't you read them in batches ? – Vasu Nov 11 '16 at 23:06
  • because the other applications use the library wish to get the entire ResultSet as requirement – Cao Felix Nov 11 '16 at 23:11
  • 1
    Your column name being null problem is not a problem. That column doesn't have a name, only a label (the as clause), so you should get the column label instead. And if you can't retrieve it by label, then it is a bug in the driver. – Mark Rotteveel Nov 12 '16 at 08:26
  • According to the doc: "Some DatabaseMetaData methods return lists of information in the form of ResultSet objects. Regular ResultSet methods, such as getString and getInt, can be used to retrieve the data from these ResultSet objects. If a given form of metadata is not available, an empty ResultSet will be returned. Additional columns beyond the columns defined to be returned by the ResultSet object for a given method can be defined by the JDBC driver vendor and must be accessed by their column label." And I found http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname – Cao Felix Nov 12 '16 at 13:02
  • @MarkRotteveel thank you! you are right. I can use getColumnLabel. that is my answer. If you want to post it as answer, I will accept it. =] – Cao Felix Nov 12 '16 at 13:12
  • That quote is just about database meta data and how to handle vendor specific extensions in the meta data result sets. – Mark Rotteveel Nov 12 '16 at 13:43

2 Answers2

2

In your query, the row_number() has no column name, it only has a label (the as clause).

Luckily, the JDBC specification (section 15.2.3) requires you to retrieve columns by label and not by name. The column label in JDBC is either the value of the as clause - if specified - or otherwise the original column name.

You can obtain the column label by using ResultSetMetaData.getColumnLabel().

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
0

You will get OutOfMemoryError when try to load all of those (40 million) records at the same time into the JVM (as Heap size is limited which is according to -Xmx setting), so the best practice is NOT to load all of them at a time, rather fetch them as batches and process the each batch of records separately.

Vasu
  • 21,832
  • 11
  • 51
  • 67
  • nope, it does not give me error when I execute the query without adding "SELECT * FROM (" in the beginning. If I just run "SELECT tablename.*, ROW_NUMBER() OVER() AS rowid FROM tablename", it returns the ResultSet immediately. However, the name would become null and I can only use ResultSet.getInt(index) to get the row number. If I run "SELECT * FROM tablename" to get the 40 million records, I can finish reading all of them with ResultSet.next() in less than few minutes. – Cao Felix Nov 11 '16 at 23:10
  • So "SELECT * FROM (SELECT tablename.*, ROW_NUMBER() OVER() AS rowid FROM tablename)" works but only for small data set. And "SELECT tablename.*, ROW_NUMBER() OVER() AS rowid FROM tablename" works but name becomes null. I thought it might be driver issue so I contact progress for this issue. I just want to see if anyone knows anything... – Cao Felix Nov 11 '16 at 23:26