0

I have a data generation code which generates records, each record is of multiple user selected fields. To speed up the processes I'm splitting the task to create records in batches, to create records in parallel

For example : If i want to generate 10k records I'm splitting it in 5 task

like say

 Task 1 : create record from 1-2k

 Task 2 : create record from 2001-3k

 ...

 Task 5 : create record from 8001-10k

And i want each thread to store records in container. As the container is filled up to a limit say initial 1k records, then one task waiting to export data will start removing the records sequentially.

My option was using Hash Map, as sequence is important but its not at all memory efficient as even if Map is empty more than 50% the size taken by Map on heap remains same until the Map is GC.

So considering my above scenario what is the best fit for container?

Ninad
  • 474
  • 8
  • 26

4 Answers4

0

It is not a good practice to store records in memory when you are talking about large sizes. Push data to a persistence dB storage as you retrieve them in batch. If you keep an index id for data the sequential order of storage can be handled from there.

If you are keen on using Hashmap : you can specify the load ratio when you initialize the object. This limits the amount of unused space in the underlying hash table. Of course, restricting this unused space causes a performance hit, as you will have more collisions in the hash table. you will also have to tune your GC, have a look at this-

http://www.cubrid.org/blog/dev-platform/how-to-tune-java-garbage-collection/][1]

But I will still suggest modifying your design as suggested before

  • We as storing it to persistence DB in case were frequency of retrieval is low, like 1 record per 5 secs – Ninad Jun 23 '15 at 07:14
0

You probably don't need to care about the garbage collector. All memory is only released when the garbage collector is called - it will never be released earlier. If your program is functionally correct and does not retain any references to unused data, then the garbage collector will clean up any unused objects.

See this question here:

confusion-over-how-javas-garbage-collector-works-nodes-queue

You can use an array to store references to your objects. If you overwrite a reference with a new reference, the old object in the array will be collected by the garbage collector. Otherwise you will have to either throw the array away or manually set any unused references to null to allow the GC to reuse the references.

You can use an ArrayList, in which case you can call clear() to empty it and release the references. Or you can throw the ArrayList away and reallocate it.

Also, see here: java-collections-and-garbage-collector

If you really want to avoid allocating and reallocating memory you will have to worry about reusing your objects holding individual records, and that is probably going to be really hard and possibly not effective anyway.

Community
  • 1
  • 1
rghome
  • 8,529
  • 8
  • 43
  • 62
0

ArrayList is a good choice. ArrayList has remove method which should be used to remove your objects. ArrayList clear() method would also help. But, as you indicated, you would have to wait for GC.

The most memory efficient way is to use primitive data types like character arrays and integer arrays. This is because primitive data types in Java are the only ones which are not garbage collected.

are java primitives garbage collected

Everything else except primitive is an Object in Java. And all Objects have memory allocation when instantiated with no option to get freed but become eligible for Garbage collection.

Community
  • 1
  • 1
Chaitanya P
  • 120
  • 7
-1

Reading these posts might help:

http://java.dzone.com/articles/batch-processing-best

http://java.sys-con.com/node/415321