8

Coming from a mostly C++ background I am now writing some Java in anger. Something I find basic in C++ using the STL seems to be more cumbersome in Java than I think it should be. My conclusion is that there is probably a better Java idiom I don't grok yet. Here is an example using pseudo-code.

I have a collection of things that have a natural ordering relation based on some member variables which happen to be strings.

class Thing
{
   String key1;
   String key2;
}

In C++ I might define an ordering operator<(Thing,Thing) and put these in a std::set. E.g.

///
/// @brief
/// provide a total order for 'Things' using key1 and key2
///
bool operator<(const Thing& a, const Thing& b)
{
  if (a.key1 < b.key1) return true; 
  else if (a.key1 > b.key1) return false; 
  else return a.key2 < b.key2;
} 

I can then find elements in O(log N) time using set::find for the case of having a Thing. Using additional overloads of operator<(). I can search having just key1 or having both key1 and key2 using std::lower_bound or std::equal_range. For example:

struct Ordering
{
   /// A strict weak ordering not a total ordering
   bool operator()(const Thing& A,const std::string& key1) const;
}

const_iterator iter = std::lower_bound(someThings.begin(),
                                       someThings.end(),
                                       key1,
                                       Ordering());

To make this less abstract imagine key1 is name and key2 is version. I can ask do we have any software called Foobar or more specifically do we have Foobar v1.0.

On the face of it, the most direct equivalent of std::set in Java appears to be TreeSet The ordering can be implemented by subclassing the Comparator interface. However for what I'm talking about it looks like multiple Maps would be needed to do this in Java. In C++ would only bother using an associative container like std::map if I wanted to change the value. In a C++ std::set as in a Java TreeSet the value is its own key. However, in C++ I can write comparators to compare "Thing" with "std::string" using key1 or key2 as appropriate and find a specific thing in a std::set of them. It looks to me like you have to use Map to do this in Java. Otherwise (because Comparator has only one type parameter) you end up with a mess like:

public static class Order implements Comparator<Object>
{
  @Override
  @Constant
  public int compare(Object a, Object b)
  {
     String aString;
     String bString;         
     if (a instanceof String)
     {
        aString = (String)a;
     }
     else if (a instanceof Thing)
     {
        aString = ((Field)a).getKey1();
     }
     else
     {
        throw new ClassCastException("String or Field object expected.");
     }
     if (b instanceof String)
     {
        bString = (String)b;
     }
     else if (b instanceof Thing)
     {
        bString = ((Field)b).getKey1();
     }
     else
     {
        throw new ClassCastException("String or Field object expected.");
     }
     return aString.compareTo(bString);
  }
};

However, If you do that you can (in class Thing) write:

Set<Thing> things = new TreeSet<Thing>(new Order());

boolean hasFieldWithKey1(final String key1) 
{
   return this.fields.contains(key1);
}

with a Java Set you can only test for existence but not retrieve the object you are searching for. e.g. you can't do

Field getFieldWithKey1(final String key1) 
{
   return this.fields.floor(key1);
}

because methods like floor() only accept objects of the value type (i.e. Thing)

The obvious solution is to use a Map for each key.

Map<String,Thing> thingsByKey1 = new TreeMap<Thing>(new Order());

Coming from a C++ background this seems unnecessarily bloated. Why should I store the key again when thing already contains it? If I have two keys its even worse. I will need two maps.

Map<String,Thing> thingsByKey1 = new TreeMap<Thing>(new OrderByKey1());
Map<String,Thing> thingsByKey2 = new TreeMap<Thing>(new OrderByKey2());

I am now duplicating not just the key but creating additional unnecessary tree data structures (or HashMaps with better runtime performance) as well. For the ordering implementation as above this is could also be 'just plain wrong' as by itself each key forms only a partial order not a total order on a set of things.

I have seen questions about searching here answered with use of linear search which is almost always the worst choice. E.g.

Finding all objects that have a given property inside a collection

I note that there is a version of BinarySearch that accepts a Comparator object as an argument but returns it the index of the element rather than the element itself. This means there is an unnecessary call to get() after using it (assuming the collection supports it).

So what is the Java way of doing this efficiently in both time and space?

Community
  • 1
  • 1
Bruce Adams
  • 4,953
  • 4
  • 48
  • 111
  • The question isn't bad, but I'm confused how you're using the `std::set` to search on different keys in O(log N) time. `std::set` uses only 1 comparator class, and only compares the `value_type` of the set, so additional overloads won't help when using `find()`, `lower_bound()`, or `upper_bound()`. And if you are resorting to `std::find()`, you're falling into a linear search. – Dave S Aug 01 '12 at 18:26
  • Using `std::find` in a set is a loss of time. `std::set` is `O(log N)`, while `std::find` on a set will be `O( N )` (in the total number of operations, even if the number of comparisons is still `O(log N)`) – David Rodríguez - dribeas Aug 01 '12 at 19:21
  • Your `operator<` implementation causes undefined behavior if `a.key1 > b.key1`, because it doesn't return a value for that case. – fredoverflow Aug 01 '12 at 19:27
  • My wording was poorly chosen (actually plain wrong). I'm in fact using std::lower_bound with multiple comparison methods and if I use find at all it will be set::find or map::find and not std::find. In this example there is a single total order to the set but several ways you might want to match against it. I'll edit to explain better and correct the operator< definition. – Bruce Adams Aug 02 '12 at 09:20

1 Answers1

4

The Java way to do this is, yes, to use Map.

Coming from a C++ background this seems unnecessarily bloated. Why should I store the key again when thing already contains it?

This isn't as much overhead as you think. You're storing one extra reference to the String, for a total cost of...4 bytes. (Actually, the cost is zero: the TreeSet implementation takes exactly as much memory as TreeMap.)

If you want to search with both keys, you can use a Comparator<Thing> that compares both keys, or make Thing implement Comparable<Thing>, and then maintain a TreeSet<Thing>. That's much more compact than the...unpleasant Comparator you wrote above. If you want to search with one key, just use a Map<String, Thing>. If you really, really want to search with both, then maintain them both. (In practice, I've almost never had to do this...and the authors of the JDK Collections framework didn't think you'd need to do this very often, either.)

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • I guess I'm curious about two things with this question. How does the Java community tend to think and what do they do when they really need to write efficient code. For example, I could use a sorted vector instead of a std::set in C++ and lose the tree altogether at the expense of insertions being more expensive. I could do the same in Java by using ArrayList and Collections.sort. A Java person might instead find something more apt in Apache Commons rather than rolling their own. – Bruce Adams Aug 02 '12 at 10:00
  • The Collections framework, by itself, tends to push you towards using consistent abstractions. For example, a "sorted `List`" wouldn't be able to satisfy the contract of `add(int, E)`, which is specified to add an element at a specific position. That said, I've never found that this significantly hurts performance, and the implementation in the JDK is usually better than anything you or I would write anyway. – Louis Wasserman Aug 02 '12 at 10:09
  • Searching around I suspect your iterables interface in Guava would better suit my programming style. So it looks like, in this case, the implementation in the JDK is not better than anything _you_ would write (i.e. at google) - http://stackoverflow.com/questions/587404/java-finding-objects-in-collections I think I need to ask myself a different question which libraries should I use and why. I also find myself asking why there is an apache commons collections library too. – Bruce Adams Aug 02 '12 at 12:33
  • In all honesty, none of the answers to the linked question -- even the ones using Guava -- perform significantly better than a straightforward for loop, and honestly, I don't think most of them are as readable. (See e.g. [this](https://code.google.com/p/guava-libraries/wiki/FunctionalExplained#Caveats) page from the Guava wiki.) Wrt Apache, Guava came after Apache, addressing the perceived problems with the Commons library that a) Apache didn't provide generics, and b) the Apache libraries tended to violate the Collection contracts. – Louis Wasserman Aug 02 '12 at 19:09
  • I know why Guava was created. Its less clear why Apache commons collections was needed in the first place. From a c++ background Guava's significant contributions are functional style algorithms and immutable collections which are provided out of the box with STL in C++. – Bruce Adams Aug 14 '12 at 11:00
  • Prompted to revisit this by an unexplained down-vote (which remains unexplained). Its been several years now with no other answers. Work has moved me back in the C++ direction and I haven't used Java much in that time. I think its time to accept this answer. If I find a better answer myself next time I move in the Java direction I'll post it. – Bruce Adams Aug 04 '15 at 11:23