1

I have a very common encapsulation problem. I am querying a table through jdbc and needs to hold the records in memory for sometime for processing.I dont have any hibernate POJO for the same table to create any objects and save.I am talking about a load of say 200 million in a single query.

The common approach is to create an object array and do casting when I need to use them. (Assume, I can get the table details like column name and data type which will be saved in some reference tables..) But this approach will be very expensive (Time) I guess when the load is taken into consideration..

Any good approach will be appreciated...

Arjun
  • 11
  • 3
  • My concern is on time taken for the processing, Assume I have enough memory in the system... – Arjun Jul 03 '14 at 12:46
  • You will need **a lot of** memory to be able to keep 200 million rows in memory. –  Jul 03 '14 at 12:48
  • Forget memory for now.. I can fire multiple queries and create threads... The problem is how will I hold the record in memory without creating an object Array.It will be good if I can avoid the time in type casting the items, as there will be many columns in each record – Arjun Jul 03 '14 at 12:53
  • You probably guess wrong about type casting being very expensive, see http://stackoverflow.com/a/8803559/3080094 – vanOekel Jul 03 '14 at 18:31

3 Answers3

1

Sounds like a CachedRowSet would do the trick here. That's pretty much exactly what you want. It will take a ResultSet and suck the entire thing down, then you can work on it at your leisure.

Addenda:

I am really looking for a robust record holder with easy access on the members

But that's pretty much exactly what a CachedRowSet is.

It manages a collection of records with named (and numbered) columns, and provides typed access to those columns.

CachedRowSet crs = getACachedRowSet();
crs.absolute(5) // go to 5th row, shows you have random access to the contents.
String name = crs.getString("Name");
int age = crs.getInt("Age");
date dob = crs.getDate("DateOfBirth");

While I'm sure you can make up something on your own, a CachedRowSet gives you everything you've asked for. If you don't want to actually load the data in to RAM, you could just use a ResultSet.

Only down side is that it's not thread safe, so you'll need to synchronize around it. But that's life. How exactly does a CachedRowSet not meet your needs?

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
  • Thanks for your comments..But I am afraid that It won't help me. Let me make it clear with an example. Assume the table is having the columns Name,Age and Date Of Birth. Now the result set has three rows. I dont want to cache this as my processing can be threaded and even I can fire multiple queries. How will I store this three columns in memory and use them when required with out complications. Object Array was one solution but Type casting is the problem. I am really looking for a robust record holder with easy access on the members – Arjun Jul 03 '14 at 13:23
  • If I can compile and keep a class on the fly and work on it (create arrays of that class object), It would have solved the problem. But to do that I should try reflection and all, which will be an overhead. I just wanted to know if there is any other better way. – Arjun Jul 03 '14 at 13:25
  • I know about CachedRowSet..but as I mentioned earlier, I dont have any problem with the memory.. I can handle the resultset.. – Arjun Jul 03 '14 at 13:27
  • How do you know the type of name is String. In my case I will not know that and hence I cant call getString. For example one row will be James,25,10/10/1988 - How will I know that first one is String next is int and third is date...The number of columns is configurable and it can vary... I may know the column name and type.. – Arjun Jul 03 '14 at 17:49
  • You can get meta data from the RowSet that will tell you the types (as reported by the database) of the columns. – Will Hartung Jul 03 '14 at 18:45
0

Well, if you need 200m objects in memory then you can initialize each while iterating through the ResultSet - you don't need to save the metadata

 ResultSet rs = stmt.executeQuery();
 while (rs.next()) {
    String col1= rs.getString("col1");
    Integer col2= rs.getInt("col2");
    MyClass o = new MyClass(col1,col2);
    add(o);
 }
 rs.close();
DanielM
  • 3,598
  • 5
  • 37
  • 53
  • No buddy, I cant do this. I cant create a class (MYClass in your Example) and deploy it. I already mentioned that I have no POJO created for the concerned table. – Arjun Jul 03 '14 at 12:57
  • To make it more clear, the table involved is completely configurable. That is why I cant create POJO classes prior to this task. – Arjun Jul 03 '14 at 12:59
0

To make it more clear, the table involved is completely configurable. That is why I cant create POJO classes prior to this task.

In that case, I'd have thought that the only real way of doing this is to turn each row into a string with delimiters (CSV, XML or something) and then create an array of strings.

You can get a list of column names returned by a JDBC query as in this answer:

Retrieve column names from java.sql.ResultSet

Community
  • 1
  • 1
Jaydee
  • 4,138
  • 1
  • 19
  • 20
  • Thanks for the answer.. But again this will be a time consuming one.. As I mentioned earlier I will be doing the processing with these records..In this case, say if I am storing it as a delimited string, then each time I have to parse the string to get the individual columns. Then again I have to cast the string to the required data types which is more or less same as storing the data in an array of objects... Does that make sense?? – Arjun Jul 03 '14 at 13:14
  • Perhaps this needs to be looked at from a different angle. What sort of processing do you need to do? – Jaydee Jul 03 '14 at 14:26
  • I will be doing some business calculations. I don't want to complicate the things here. In simple words, I will take a record and split it into multiple based on some keys... – Arjun Jul 03 '14 at 17:46