3

In my application I would be using a Map.

  • Multiple threads would be writing data to this map. The write operations are too many.
  • However, the data that is fed to the map has a different key during every write.
  • The data in the map would not be read at any point in the application.
  • Once in a while, the content would be dumped to a file.

I would like to know the following :

  1. In this case, would it be necessary to synchronize the write method?
  2. Does a ConcurrentHashMap suit my needs?
  3. If not, what would be the right Map implementation to use in this case?
randers
  • 5,031
  • 5
  • 37
  • 64
sujith
  • 665
  • 2
  • 9
  • 22
  • 2
    _"The data in the map would not be read at any point in the application"_ and then _"Once in a while the content would be dumped to a file"_ - so how do you dump to a file without reading the map? – Sean Bright Jul 21 '15 at 14:55
  • 1
    These two points are incompatible: "The data in the map would not be read at any point in the application" and "Once in a while the content would be dumped to a file". The content cannot be dumped to a file without reading out the full contents of the map. – John Bollinger Jul 21 '15 at 14:55
  • 1
    You might also want to consider keeping one map per thread and then merging them when you dump to the file. Then, access to the maps will only block when you dump to the file instead of blocking for every write. – xp500 Jul 21 '15 at 14:57
  • Do you need to have keys serialized into file, or only values ? – John Jul 21 '15 at 15:03
  • `ConcurrentHashMap` is getting a lot of thumbs up in the answers. It *probably* suits your need, but that's only if file dumps will never overlap, and if your map does not need to accept `null` as a key. Also not if you perform any aggregate operations (such as `putAll()` or `clear()`) and require them to appear atomic with respect to the dump-to-file operation. – John Bollinger Jul 21 '15 at 15:05
  • @Bollinger - By dumping i mean , the entire content of the map is dumped into the file by calling toString() method on the map and passing this as argument to printwriter. Only the data is important in the application and so i am doing this . – sujith Jul 22 '15 at 17:26
  • @sujith, By "only if file dumps will never overlap" I meant in overlap in *time*, not in data, and in fact I now think that's not so much an issue after all. There is no reason to believe, however, that any particular mechanism for extracting all the contents of the map (e.g. by using `toString()`) has different characteristics with respect to synchronization requirements than any other one. – John Bollinger Jul 22 '15 at 17:42
  • @Bollinger : Thank you for the detailed explanation. This answers my question perfectly. – sujith Jul 23 '15 at 09:39

6 Answers6

8

Focusing on these points:

  • The data that is fed to the map has a different key during every write

  • The data in the map would not be read at any point in the application

You don't need a Map at all. I'm assuming that when you state that the data in the map won't be read, you mean that you're not doing map.get(someKey) but instead you will traverse the whole map to store the data in the file (or whatever data source you use).

This point:

  • Once in a while the content would be dumped to a file

Reinforces the recommendation above.

Focusing on this point:

  • Multiple threads would be writing data to this map.The write operations are too many.

The best recommendation is to use a BlockingQueue. As implementation, you may use LinkedBlockingQueue.

In case you dump the data from the Map using Java synchronization and want/need to recover this data in form of a Map, then use a ConcurrentHashMap. If this is not part of your use case because you will read the data from the file on other ways, then avoid using Map and use BlockingQueue.

Community
  • 1
  • 1
Luiggi Mendoza
  • 85,076
  • 16
  • 154
  • 332
  • You missed the part where the whole map is dumped into file. Since keys are also content of the map, i would argue that this answer doesn't address the question correctly. – John Jul 21 '15 at 14:59
  • @user3360241 since the keys have no purpose since they're always unique, then that part is really pointless. Unless the key is relevant data that cannot be found inside the value of the `Map` (which nobody knows except OP), it doesn't seem good. Or, on the other hand, if it have value, store this *key* in a field in the value and save it as well. Again, the `Map` option is useless for this use case. – Luiggi Mendoza Jul 21 '15 at 15:01
  • What if the op wanted to rebuild the Map from file within another application ? But as you say key could be embedded into value for most cases, but not for this one. – John Jul 21 '15 at 15:07
  • @user3360241 since that is not pointed out by OP in the question, then there's no need to address it. Also, it's not that hard to build a map from a file. And OP's never talking about a serialization mechanism. – Luiggi Mendoza Jul 21 '15 at 15:09
  • @user3360241, a blocking queue is a perfectly fine approach. It is simply a question of the type of elements to be enqueued. That could be, for example, a suitably-parameterized `Map.Entry` if need be. – John Bollinger Jul 21 '15 at 15:12
1

To 1: The Map interface does not guarantee any synchronization, especially not on writes. Looking at the non-concurrent implementations (HashMap, HashTable, IdentityHashMap, LinkedHashMap, TreeMap and WeakHashMap), the all state that

if multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

To 2 and 3: If you were using a ConcurrentHashMap, you would not have to worry about synchronization. But I agree with Luiggi Mendoza's answer: do not use a Map.

Community
  • 1
  • 1
Turing85
  • 18,217
  • 7
  • 33
  • 58
0
  1. No
  2. Yes

However i find this is a contradiction:

  • The data in the map would not be read at any point in the application
  • Once in a while the content would be dumped to a file.

How can you dump to a file without reading it?

I think it's safe to say that ConcurrentHashMap can handle that case anyway, so go for it.

Viktor Mellgren
  • 4,318
  • 3
  • 42
  • 75
-1

If you really need a Map, a ConcurrentHashMap is what you need. Read more about it here.

Community
  • 1
  • 1
Nicklas Jensen
  • 1,424
  • 12
  • 19
-1

As you said too, ConcurrentHashMap seems meets your requirement. It is thread safe without synchronizing the whole map. Reads can happen very fast while write is done with a lock.

user3359139
  • 430
  • 4
  • 17
-3

All your keys are unique so you don't necessarily need synchronization for ensuring the integrity of the map but you do need it when you would actually be writing to the file. Use ConccurentHashMap or a normal synchronized map both suit you well. You can do without a Map too simply store key/value in some object and store the object in a synchronized list.

Geek
  • 23,089
  • 20
  • 71
  • 85
  • 1
    Having distinct keys in no way means that writes to the map do not require synchronization. – John Bollinger Jul 21 '15 at 15:00
  • A quote from e.g. [`TreeMap`'s Javadoc](https://docs.oracle.com/javase/8/docs/api/java/util/TreeMap.html): *If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally*. – Turing85 Jul 21 '15 at 15:04
  • @JohnBollinger and Turing : Yeah I do understand that. My question is, if all keys that are ever going to be inserted and are Unique and the Map is used only for writing purpose what could be the possible harm ? For eg : write from one thread might not be available to the other thread.. fair enough.. the only harm will be when you write to the file as you might not see all updates.. correct ? Did I miss something ? I agree above is not a perfect comment – Geek Jul 22 '15 at 09:51
  • It's not merely a question of not seeing updates, though that possible problem should be discounted. The map's internal state could become *corrupted* by unsynchronized concurrent updates. The keys all being distinct does not protect from that. More generally, program behavior is not well defined in the absence of proper synchronization. – John Bollinger Jul 22 '15 at 13:04
  • @JohnBollinger : And how probably would the map get corrupted ? Multiple threads writing to the same bucket because hashcode is same ? – Geek Jul 22 '15 at 14:07
  • *Any* piece of state modified by multiple threads presents an opportunity for corruption. Two or more putting items in the same bucket (which does not depend on them having equal hash codes) would certainly produce such a situation. So also would a thread causing a rehashing to occur as a result of adding an element. Without studying the implementation, you cannot know what other possibilities there may be. – John Bollinger Jul 22 '15 at 14:59
  • @John Bollinger : If multiple threads are writing to the same map , could the data corruption be avoided using ConcurrentHashMap or do the writes still need to be externally synchronized ? – sujith Jul 22 '15 at 17:30
  • @sujith, `ConcurrentHashMap` is all about avoiding the need for external synchronization, so the answer to your question is "yes", subject to the provisos I already mentioned in comments on the original post. – John Bollinger Jul 22 '15 at 17:44
  • @JohnBollinger : That explains it. Thanks. – Geek Jul 22 '15 at 17:45