0

I am processing Big Data in a multithreading Java environment and i need a very efficient Sparse Vector Library that i can use. Any ideas? I just need it to have a super fast vector addition nothing more. The only operation i'm using is to add two sparse vectors but i'm going to do that very frequently on a shared hashtable. So each item in hashtable is . I do something like this:

anotherVector = initVector()
lock.acquire()
wordVector = hashtable.get(word)
wordVector = wordVector + anotherVector
hashtable.put(word, wordVector)
lock.release()

I need the addition task to be very fast so that i can release the whole hashtable resource as soon as possible for other threads.

BTW If there is any other ideas on how to implement this (e.g. using another data structure or another design) i'd be happy to hear about it. The only point is that i need it to be efficient for English Wikipedia.

Averroes
  • 4,168
  • 6
  • 50
  • 63
Ash
  • 3,428
  • 1
  • 34
  • 44
  • 1
    The most efficient sparse vector in the Java JDK is the `HashMap`. You might consider using a `ConcurrentHashMap` for your `hashtable` rather than locking the whole structure too. It is fairly common to lock the _key_ rather than the whole `Map` as you are currently doing. – Boris the Spider Mar 28 '14 at 08:46
  • @BoristheSpider I need a ConcurrentHashMap of Hashmaps yes? because for each word in a corpus i have a sparse vector so i'll have a ConcurrentHashMap(String, HashMap)? another point is that when i need to update the vector of a string i fetch it update it and put it back. I think i'll need to lock the key before getting the vector untill updating it. is it possible to lock that specific key in ConcurrentHashMap? – Ash Mar 28 '14 at 09:00
  • 1
    Yes, exactly, a `ConcurrentHashMap>` or whatever. The ability to lock keys is not build into the `ConcurrentHashMap` but you can do that externally fairly easily. – Boris the Spider Mar 28 '14 at 09:06
  • 1
    I answered a [somewhat similar question here](http://stackoverflow.com/a/22141825/2071828), it demonstrates a two-map key locking approach. You can also use key wrappers. – Boris the Spider Mar 28 '14 at 09:13
  • @BoristheSpider Yess. Your idea is great. I love it. It seems something that will certainly work. So we create a lock for each key and we lock the entire hashmap on that specific key-lock. Thanks for this wise idea. – Ash Mar 28 '14 at 09:37
  • @BoristheSpider I tried your design and it worked great. The only point is that putIfAbsent returns null if the key doesn't exist so should check if lock is null use the newly created lock. I couldn't comment this in your answer here http://stackoverflow.com/a/22141825/2071828 you can change it if you want. – Ash Mar 28 '14 at 16:51
  • Glad it helped! Good spot - will fix that when I have some time. – Boris the Spider Mar 28 '14 at 19:47

0 Answers0