5

I implemented a sparse matrix as List<Map<Integer,Double>>.
To get all entries of row i I call list.get(i).keySet(). How expensive is this call?

I also used the trove library for an alternative implementation as List<TIntDoubleHashMap>.
What's the cost of calling list.get(i).keys(), here?

Do you have any further ideas of how to implement an efficient sparse matrix?
Or can you provide a list of existing implementations in java?

user306708
  • 921
  • 1
  • 9
  • 12

3 Answers3

5

Depends on the class implementing List and Map. If you are using a List class implementing java.util.RandomAccess (ie. ArrayList), then a call to get(i) is O(1). If it is a LinkedList, it will be O(n).

-- Edited to show the following code snippet (since verdy_p below doesn't read well, and likes to go off the tangent): --

// In HashMap.java, line 867, JDK 1.6.0.24, how much more
// constant time do we want?

public Set<K> keySet() {
    Set<K> ks = keySet;
    return (ks != null ? ks : (keySet = new KeySet()));
}

-- end of edit --

A call to keySet() on most Map implementations will be constant time.

Regarding traversing the keySet() If you are using an array-backed Map implementation (like HashMap), keySet() relies on entrySet() which returns an internal iterator backed by an array. So iteration of keySet() is O(n).

I would also assume that is the case for most (if not all) Map implementations that are backed by arrays.

For SortedMap implementations (like TreeMap), iterating on its keys will be akin to iterating on a tree from lowest to greatest key. This is equivalent to a failed binary search which is O(n).

Both cases appear to be O(n). If you use Eclipse, you can actually look at the code implementing the java classes and get a better idea of their complexity.

For classes under java.util.concurrent (like ConcurrentHashMap), you'll have to take other considerations to determine how expensive they are.


To expand a bit more, if you use a linked list, list.get(i).keyset() will be O(n). With ArrayList, it will be O(1). Traversing the keyset will depend on whether you use an array-backed Map (HashMap) or a SortedMap (TreeMap). In both cases, traversing will be O(n) with the former being significantly faster than the later since array traversal will always be faster than traversing through pointers (or references in this Java specific case.)

Now, if you take both list.get(i).keySet() and the iteration of the set into account, with a linked list implementation, that will be O(n^2). So instead of doing list.get(i).keySet(), you should use an iterator (see pseudocode below, it obviates generic syntax for clarity)

This is O(n^2) for lists that do not implement java.util.RandomAccess (like LinkedList):

for( int i = 0; i < list.size(); i++ )
{
   Set keySet = list.get(i).keySet();
   for( Integer key : keySet.iterator() )
   {
      ... stuff (assuming constant time) ...
   }
}

This is O(n) for that same type of List implementations:

for( Map m : list.iterator() )
{
   for( Integer key : m.keySet() )
   {
      ... stuff (assuming constant time) ...
   }
}
luis.espinal
  • 10,331
  • 6
  • 39
  • 55
  • You wrote "A call to keySet() on most Map implementations will be constant time." This is plain false and just based on the **false assumption** that a Hashmap avoids collisions most of the time. In fact a Hashmap only avoids a singnificant number of collisions, depending on its filling factor (the more the Hashmap is filled, the more you get collisions). But a default Hashmap uses a fill factor of about 85%, which means that you'll get 1 collision on about half cases (for random accesses). An in average you'll get about 1.5 values to traverse to get the good one. – verdy_p Mar 22 '12 at 03:28
  • And this also depends on how good the values are distributed on their computed hash(): if your hash() function is not correctly written to correctly randomize the bits of the all the possible source values they intend to hash, the results will be much poorer, and you'll get many more collisions. So you should have written ""A call to keySet() on most Map implementations will be performed in a O(1) time." without warranty that this O(1) time will small (there are even worst cases were it could be O(n) with a bad hash() function !) – verdy_p Mar 22 '12 at 03:34
  • Note that the builtin hash() functions for native types (byte, short, int, long, float, double) are extremely basic and do **not** create really randomized distribution of bits in the 32-bit patterns returned as hash ! This means that HashMap does not warranty at all a constant time, and not even a O(1) access time for many easily reproducible worst cases (in that case the HashMap behaves like an unordered list with O(n) access time!, only mitigated by the HashMap fill factor, which is too large in my opinion and should be reduced to 50% instead of the default 85%....) – verdy_p Mar 22 '12 at 03:39
  • 1
    @verdy_p, dude, READ this slowly, once or twice before replying. Look at the JDK source code of HashMap.keySet() (HashMap.java, line 867, JDK 1.6.0.24) and tell me that it does not execute in constant time. **Don't reply back until you do so**. Better yet, look at polygenelubricants's reply (Apr 8 '10). Calling keySet() does not involve hashing, searching or collision detection of any time. I know there is always the basal urge to to post something and show what we know, but c'mon, at least read the sentence you are criticizing. – luis.espinal Mar 25 '12 at 16:56
  • The main reason was that the use of boxed types was really slow for managing lots of numeric entries in a typical situation where you will handle lots of values in multiple sparse matrix constantly reallocated. The garbage collector is not very fast, even for boxed native types that have small constant sizes allocated in separate pools. – verdy_p Jul 17 '12 at 23:27
  • Also I wrote "on most Map implementations". I did not restrict my self to only the "Hashmap". And yes I observe collision lists even with Hashmap, and real slowdowns with fill factors above 85%. yes keySet is returned in constant time, but not when you traverse it. It also has an implementation behind this interface. – verdy_p Jul 17 '12 at 23:29
  • Oh I also forgot speaking about your insulting word "dude". You also made false assertions about the fact I would hae not read the implementation of Hashmap in the JDK. I have read it since many years and even found that its implementation was really poor for many frequently used cases (but the worst thing is the very poor implementation of the hash() method for boxed native types; notably number types). Definitely you'll want to reconsider the use of the default SLOW/INEFFICIENT Hashmap and implement a better Map. – verdy_p Feb 20 '14 at 11:22
  • "Dude" is offensive? And you replied to that almost two years after your last reply? Silence of the Lambs!!! Seriously, instead of using your time to recollect ancient grievances just call Michael Moore so that he can make a dramatic documentary with you at the center of it all. – luis.espinal Feb 20 '14 at 14:48
  • I received a notification here; even if the intial post was two years old, that topic gets visited. There's no tie li,it to reply and apparently someone did: there are no limbs on this site – verdy_p Feb 20 '14 at 16:05
3

It's as cheap as it gets, since it's a view.

From the jdk7 source line 884:

public Set<K> keySet() {
    Set<K> ks = keySet;
    return (ks != null ? ks : (keySet = new KeySet()));
}

Trove is probably faster since unlike the Java Collection Frameworks it can work directly with primitives without expensive boxing/unboxing.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
2

According to Sparse matrices / arrays in Java, the Colt library includes this functionality; diving into their Javadoc API, this seems to be true, and times are included.

Additionally, your implementation seems to use no column-wise sparsity (you only have hashmaps on the rows). Theirs does, and is optimized for ints and doubles, as is the case in Trove (but not in the standard Java case, which uses objects with considerable overhead). I recommend Colt.

Community
  • 1
  • 1
tucuxi
  • 17,561
  • 2
  • 43
  • 74