0

I'm writing this job that need to read tons of data out of files and process them. Currently I just save them into a Set but obviously it doesn't work, after running the job for couple of min, it spit out:

"out of memory: java heap" error.

Now it worries me that reading is only the start of the job, once I get all data in, I need to build the table to process it, if I cannot even read all the data, how to build this giant table? my original plan is use Google guava's Table class, is there any other better options out there?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user468587
  • 4,799
  • 24
  • 67
  • 124

1 Answers1

1

As others are saying, this is pretty tough to answer without knowing more detail. Since you are considering using a collection to hold all of this data, it sounds like you can't merely process it row-by-row. "Processing the data" requires potentially other data in the table.

That means you need a file-backed DB of some sort. If you don't have access to an ordinary relational database to handle this, then you might consider using a in-memory database such as H2 or JavaDB/Derby. These kinds of databases run in the same VM as your application, but they can use a persistent store to back large tables if you configure accordingly.

* EDIT *

Here is some code which could apply using something like H2. (Exception handling omitted)

Connection connection = DriverManager.getConnection( "jdbc:h2:pruneDB");
Statement stmt = connection.createStatement();
stmt.execute("CREATE TABLE PERSON (USER_ID INT, ITEM_ID INT, BOOK_ID INT )");
stmt.close();

At this point, create a loop which reads your rows of data and insert them into the DB:

while( hasMoreRows() ) {
    ... read the three IDs you need into variables from your file ...

    int bookId = someValueFromTheTextRow;
    int userId = someOtherValueFromTheTextRow;
    int itemId = yetAnotherValueFromTheTextRow;

    // After this, just create a PreparedStatement object, bind your IDs to it, and perform an SQL 
    // insert into the DB table you created above
}

Once you are out of the loop, you now can use standard SQL to selective delete items from that table.

mightyrick
  • 910
  • 4
  • 6
  • the requirement is to pruning some less useful data, say i build the table with row as user id, column as item id, cell as list of books related to that user and item; then i scan the table, if the user has only one item associate with him, remove all the books related to that user; second, if the item has only one user associate with it, remove all the books related to that item...and there are more rules for pruning the data. so i have to build the table with all data first then do the pruning. – user468587 Jan 10 '13 at 21:04
  • Ok, in this situation I definitely would use something like JavaDB/Derby or H2. What you'll want to do is instantiate the DB inside of your code, create the table, populate it, and then prune from it after it is populated. All of this using standard SQL statements. – mightyrick Jan 10 '13 at 21:26