0

I am working on a java program that parses files to lists and then inserts data into a DB. This runs on a server with a ton of memory. Are there java limitations I need to be aware of?

Like such that I shouldn't parse, for instance, a GB of data into a list before inserting it into the DB?

glyphx
  • 195
  • 3
  • 10
  • 1
    Your program will probably have a default max amount of memory allocated to it that you should be able to increase. Your OS might have limits on how much memory a single program can use (I think with Windows it's 2GB?), but that may or may not be a problem. – Michelle Nov 05 '13 at 21:46
  • There some reason why you can't use threads, one to mine and one to store, with some sort of thread safe queue in the middle? – Tony Hopkinson Nov 05 '13 at 21:47

5 Answers5

1

You have more limitations than just Java to worry about.

There's network bandwidth usage, hogging your database server CPU, filling up the database transaction log, JDBC performance for mass inserts, slowness while the database updates its indexes or generates artificial keys.

If your inputs get too huge you need to split them into chunks and commit the chunks separately. How big is too big depends on your database.

The way your your artificial keys get allocated can slow the process down, you may need to create batches of values ahead of time, such as by using a hilo generator.

Kicking off a bunch of threads and hammering the database server with them would just cause contention and make the database server work harder, as it has to sort out the transactions and make sure they don't interfere with each other.

Consider writing to some kind of delimited file, then run a bulk-insert utility to load its contents into the database. That way the database actually cooperates, it can suspend updating indexes and checking constraints, and sequences and transactions aren't an issue. It is orders of magnitude faster than JDBC.

Nathan Hughes
  • 94,330
  • 19
  • 181
  • 276
1

Nathans answer is decent - so I'll only add a few bits here...

If you are not doing anything anything terribly sophisticated in your program, then it's might be good practice to write in streaming fashion - in simple terms, read in the input a line at a time and then directly output this to a file, finally calling the database's specific (most of them have one) bulk upload tool.

Reading in all the lines into memory, and then calling insert() over the loop would be pretty inefficient.

you don't give us many clues about why you are reading in this data all in one go - is there a reason for needing to do this?

phatmanace
  • 4,671
  • 3
  • 24
  • 29
0

It depends on how much memory you have allocated for JVM.

How much memory you can allocate to JVM depends again on Client VM (or) Server VM type.

Check -Xmx and -Xms settings.

kosa
  • 65,990
  • 13
  • 130
  • 167
0

Not directly, but you may want to tweak the JVM arguments a bit.

What are the Xms and Xmx parameters when starting JVMs? might be a useful reference.

Community
  • 1
  • 1
jessebs
  • 517
  • 3
  • 14
0

The limits you might need to be aware of are

  • a List cannot have 2^31 entries or more.
  • a JVM does scales up to 32 GB well but not much higher as the cost of GC increases with the heap size (unless you have Azul's Zing)

Tons of memory is 256 - 512 GB these days and I would suggest using off heap memory if you need more than 32 GB in one JVM (or Zing).

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • does scale? Or does not scale? For JDBC batch insert program I doubt he needs off heap – bwawok Nov 05 '13 at 22:11
  • @bwawok It was ambiguous and could be read either way :| I have reworded it. In short, I am saying whatever you application does now, it is unlikely you are hitting a serious limit. – Peter Lawrey Nov 06 '13 at 05:34