47

I understand that only one instance of any object according to .equals() is allowed in a Set and that you shouldn't "need to" get an object from the Set if you already have an equivalent object, but I would still like to have a .get() method that returns the actual instance of the object in the Set (or null) given an equivalent object as a parameter.

Any ideas/theories as to why it was designed like this?

I usually have to hack around this by using a Map and making the key and the value same, or something like that.

EDIT: I don't think people understand my question so far. I want the exact object instance that is already in the set, not a possibly different object instance where .equals() returns true.

As to why I would want this behavior, typically .equals() does not take into account all the properties of the object. I want to provide some dummy lookup object and get back the actual object instance in the Set.

skaffman
  • 398,947
  • 96
  • 818
  • 769
GreenieMeanie
  • 3,560
  • 4
  • 34
  • 39
  • I really want to do this, but with a key of a Map. Like, I want Map.getKey(K k) to return k' where k.equals(k'). Can I do this with your hack? Or will I have to make a Pair, and change my Map to Map>? – Jayen Oct 20 '11 at 10:42
  • "Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set." http://en.wikipedia.org/wiki/Set_(abstract_data_type) – GClaramunt Jul 17 '12 at 00:00

24 Answers24

30

While the purity argument does make the method get(Object) suspect, the underlying intent is not moot.

There are various class and interface families that slightly redefine equals(Object). One need look no further than the collections interfaces. For example, an ArrayList and a LinkedList can be equal; their respective contents merely need to be the same and in the same order.

Consequently, there are very good reasons for finding the matching element in a set. Perhaps a clearer way of indicating intent is to have a method like

public interface Collection<E> extends ... {
  ...
  public E findMatch(Object o) throws UnsupportedOperationException;
  ...
}

Note that this API has value broader that within Set.

As to the question itself, I don't have any theory as to why such an operation was omitted. I will say that the minimal spanning set argument does not hold, because many operations defined in the collections APIs are motivated by convenience and efficiency.

Dilum Ranatunga
  • 13,254
  • 3
  • 41
  • 52
  • 7
    So, in summary, we don't have a very good reason for not having a collection with this sort of functionality. – Johnathon Sanders Jul 10 '12 at 14:01
  • 4
    The answer to many "why does XX not do YY" questions is "Because someone didn't think it should". Actually, that's the real answer to nearly all such questions. A better question might be "Are there any reasons someone designing an XX shouldn't have it do YY". I like that you found the implied question and answered "In this case, there are reasons why someone designing an XX should do YY, even though Java's XX doesn't happen to do so". – supercat Dec 04 '12 at 17:09
  • One argument against adding this sort of functionality is that one shouldn't "sanction" impure .equals or make the API harder to understand for people not thinking along these lines. Adding support for less pure definitions of .equals makes those definitions more likely to appear. Obviously this is less than comforting when you have one for good reason. – Zachary Vance Jan 30 '13 at 20:40
  • 1
    In many languages, java in particular: a.equals(b)!= (a==b) so a good legitimate example would be a string cache. because reference equality and object equality differ, you will have multiple copies of essentially the same string. a set would be the optimal way to store and seek the object - allowing you to retrieve the previous reference and save the extra memory. reference replacement is the primary example i know that doesn't break the intention of equals. and for those that assume that all Sets are really maps, look at trove's set which is more memory efficient than a map impl. – NightDweller Oct 17 '13 at 09:47
  • @NightDweller: The savings could be even more dramatic with more complex nested immutable types. References to large immutable collections containing the same items may be semantically equivalent, but that doesn't mean that replacing all references to identical collections with a reference to one of them couldn't offer order-of-magnitude speedups. Note that for such purposes, a weak hash set would be optimal, but `WeakHashMap` would not be suitable because it holds strong references to values. – supercat Dec 26 '13 at 02:23
  • To add a supporting argument, there are other set APIs do offer a way to get the object. For example, Cocoa's `NSSet` class contains both a `-containsObject:` method to test whether an object is contained and returns a boolean, and a `-member:` method to get the object that is equal to the given object. OCaml standard library's Set contains both a `mem` method to test for membership and a `find` method to get the equal element. – newacct Dec 28 '13 at 20:38
15

The problem is: Set is not for "getting" objects, is for adding and test for presence. I understand what are you looking for, I had a similar situation and ended using a map of the same object in key and value.

EDIT: Just to clarify: http://en.wikipedia.org/wiki/Set_(abstract_data_type)

GClaramunt
  • 3,148
  • 1
  • 21
  • 35
  • Where is your source for this definition? This isn't stated in the javadocs anywhere that I've found. Also, the english definition of a set is: "a number, group, or combination of things of similar nature, design, or function". So, who says it's just for adding and testing of presence? Just because we don't have a get method exposed, doesn't mean that can't be changed in the future. – Johnathon Sanders Jul 10 '12 at 13:53
  • "Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set." http://en.wikipedia.org/wiki/Set_(abstract_data_type) Just because a CompSci term has the same name than in english doesn't mean they mean the same :) – GClaramunt Jul 12 '12 at 23:04
  • 1
    Very true about the english definition.. it was my attempt to find a formal definition. Thank you for the source link, I have now been taught! :) – Johnathon Sanders Jul 13 '12 at 19:32
6

I had the same question in java forum years ago. They told me that the Set interface is defined. It cannot be changed because it will break the current implementations of Set interface. Then, they started to claim bullshit, like you see here: "Set does not need the get method" and started to drill me that Map must always be used to get elements from a set.

If you use the set only for mathematical operations, like intersection or union, then may be contains() is sufficient. However, Set is defined in collections to store data. I explained for need get() in Set using the relational data model.

In what follows, an SQL table is like a class. The columns define attributes (known as fields in Java) and records represent instances of the class. So that an object is a vector of fields. Some of the fields are primary keys. They define uniqueness of the object. This is what you do for contains() in Java:

class Element {

        public int hashCode() {return sumOfKeyFields()}
        public boolean equals(Object e) {keyField1.equals(e) && keyField2.equals(e) && ..}

I'm not aware of DB internals. But, you specify key fields only once, when define a table. You just annotate key fields with @primary. You do not specify the keys second time, when add a record to the table. You do not separate keys from data, as you do in mapping. SQL tables are sets. They are not maps. Yet, they provide get() in addition to maintaining uniqueness and contains() check.

In "Art of Computer Programming", introducing the search, D. Knuth says the same:

Most of this chapter is devoted to the study of a very simple search problem: how to find the data that has been stored with a given identification.

You see, data is store with identification. Not identification pointing to data but data with identification. He continues:

For example, in a numerical application we might want to find f(x), given x and a table of the values of f; in a nonnumerical application, we might want to find the English translation of a given Russian word.

It looks like he starts to speak about mapping. However,

In general, we shall suppose that a set of N records has been stored, and the problem is to locate the appropriate one. We generally require the N keys to be distinct, so that each key uniquely identifies its record. The collection of all records is called a table or file, where the word "table" is usually used to indicate a small file, and "file" is usually used to indicate a large table. A large file or a group of files is frequently called a database.

Algorithms for searching are presented with a so-called argument, K, and the problem is to find which record has K as its key. Although the goal of searching is to find the information stored in the record associated with K, the algorithms in this chapter generally ignore everything but the keys themselves. In practice we can find the associated data once we have located K; for example, if K appears in location TABLE + i, the associated data (or a pointer to it) might be in location TABLE + i + 1

That is, the search locates the key filed of the record and it should not "map" the key to the data. Both are located in the same record, as fileds of java object. That is, search algorithm examines the key fields of the record, as it does in the set, rather than some remote key, as it does in the map.

We are given N items to be sorted; we shall call them records, and the entire collection of N records will be called a file. Each record Rj has a key Kj, which governs the sorting process. Additional data, besides the key, is usually also present; this extra "satellite information" has no effect on sorting except that it must be carried along as part of each record.

Neither, I see no need to duplicate the keys in an extra "key set" in his discussion of sorting.

... ["The Art of Computer Programming", Chapter 6, Introduction]

entity set is collection or set all entities of a particular entity type [http://wiki.answers.com/Q/What_is_entity_and_entity_set_in_dbms] The objects of single class share their class attributes. Similarly, do records in DB. They share column attributes.

A special case of a collection is a class extent, which is the collection of all objects belonging to the class. Class extents allow classes to be treated like relations

... ["Database System Concepts", 6th Edition]

Basically, class describes the attributes common to all its instances. A table in relational DB does the same. "The easiest mapping you will ever have is a property mapping of a single attribute to a single column." This is the case I'm talking about.

I'm so verbose on proving the analogy (isomorphism) between objects and DB records because there are stupid people who do not accept it (to prove that their Set must not have the get method)

You see in replays how people, who do not understand this, say that Set with get would be redundant? It is because their abused map, which they impose to use in place of set, introduces the redundancy. Their call to put(obj.getKey(), obj) stores two keys: the original key as part of the object and a copy of it in the key set of the map. The duplication is the redundancy. It also involves more bloat in the code and wastes memory consumed at Runtime. I do not know about DB internals, but principles of good design and database normalization say that such duplication is bad idea - there must be only one source of truth. Redundancy means that inconsistency may happen: the key maps to an object that has a different key. Inconsistency is a manifestation of redundancy. Edgar F. Codd proposed DB normalization just to get rid of redundancies and their inferred inconsistencies. The teachers are explicit on the normalization: Normalization will never generate two tables with a one-to-one relationship between them. There is no theoretical reason to separate a single entity like this with some fields in a single record of one table and others in a single record of another table

So, we have 4 arguments, why using a map for implementing get in set is bad:

  1. the map is unnecessary when we have a set of unique objects
  2. map introduces redundancy in Runtime storage
  3. map introduces code bloat in the DB (in the Collections)
  4. using map contradicts the data storage normalization

Even if you are not aware of the record set idea and data normalization, playing with collections, you may discover this data structure and algorithm yourself, as we, org.eclipse.KeyedHashSet and C++ STL designers did.

I was banned from Sun forum for pointing out these ideas. The bigotry is the only argument against the reason and this world is dominated by bigots. They do not want to see concepts and how things can be different/improved. They see only actual world and cannot imagine that design of Java Collections may have deficiencies and could be improved. It is dangerous to remind rationale things to such people. They teach you their blindness and punish if you do not obey.

Added Dec 2013: SICP also says that DB is a set with keyed records rather than a map:

A typical data-management system spends a large amount of time accessing or modifying the data in the records and therefore requires an efficient method for accessing records. This is done by identifying a part of each record to serve as an identifying key. Now we represent the data base as a set of records.

Val
  • 1
  • 8
  • 40
  • 64
  • I sometimes perceive an tendency, hardly unique to Java's maintainers, to view with hostility any improvements which would, if adopted, quickly demonstrate that millions of hours of programming hours have been wasted as a result of not adopting them sooner. The greater the savings that would be achieved by implementing a change, the greater the hostility to it. Java has at least finally allowed interfaces to declare default methods, something .NET doesn't have yet, even though such ability poses more problems in Java than it would in .NET. – supercat Jun 11 '14 at 22:36
2

Well, if you've already "got" the thing from the set, you don't need to get() it, do you? ;-)

Your approach of using a Map is The Right Thing, I think. It sounds like you're trying to "canonicalize" objects via their equals() method, which I've always accomplished using a Map as you suggest.

andersoj
  • 22,406
  • 7
  • 62
  • 73
  • 1
    I agree: For the Set it doesn't make sense to return a different object, since they are equal() anyway and that implies that they can be used interchangeably anyway. Semantically you'd want a Map from an Object to it's canonicalized ("interned" for Strings, for example) instance. The only un-usual think here would be that the key and value are the same object. – Joachim Sauer May 14 '09 at 11:20
  • Till now I agreed with this statement. But here's why I dont anymore: Imagine class XYZ with field A, B and C. XYZ's equality is based on say only A and B. So in the set I might have XYZ instance with all fields set. Suppose I have another instance of XYZ where only A and B are set. One would need `get(Object)` to get that instance from set where A, B and C are set. – Boss Man Jan 27 '21 at 15:53
1

If you think about it as a mathematical set, you can derive a way to find the object.
Intersect the set with a collection of object containing only the object you want to find. If the intersection is not empty, the only item left in the set is the one you were looking for.

public <T> T findInSet(T findMe, Set<T> inHere){
   inHere.retainAll(Arrays.asList(findMe));
   if(!inHere.isEmpty){
       return inHere.iterator().next();
   }
   return null;
}

Its not the most efficient use of memory, but its functionally and mathematically correct.

DanMan0715
  • 11
  • 1
1

I'm not sure if you're looking for an explanation of why Sets behave this way, or for a simple solution to the problem it poses. Other answers dealt with the former, so here's a suggestion for the latter.

You can iterate over the Set's elements and test each one of them for equality using the equals() method. It's easy to implement and hardly error-prone. Obviously if you're not sure if the element is in the set or not, check with the contains() method beforehand.

This isn't efficient compared to, for example, HashSet's contains() method, which does "find" the stored element, but won't return it. If your sets may contain many elements it might even be a reason to use a "heavier" workaround like the map implementation you mentioned. However, if it's that important for you (and I do see the benefit of having this ability), it's probably worth it.

Oren Shalev
  • 952
  • 2
  • 9
  • 19
  • 2
    Yes, of course I can iterate over all the elements, but then the complexity of finding the object instance becomes linear, which is very bad, especially if the backing set is a HashSet. You are right, I was looking for an explanation of why Sets behave that way, not a workaround. – GreenieMeanie May 14 '09 at 13:53
1

So I understand that you may have two equal objects but they are not the same instance.

Such as

Integer a = new Integer(3);
Integer b = new Integer(3);

In which case a.equals(b) because they refer to the same intrinsic value but a != b because they are two different objects.

There are other implementations of Set, such as IdentitySet, which do a different comparison between items.

However, I think that you are trying to apply a different philosophy to Java. If your objects are equal (a.equals(b)) although a and b have a different state or meaning, there is something wrong here. You may want to split that class into two or more semantic classes which implement a common interface - or maybe reconsider .equals and .hashCode.

If you have Joshua Bloch's Effective Java, have a look at the chapters called "Obey the general contract when overriding equals" and "Minimize mutability".

Sorin Mocanu
  • 936
  • 5
  • 11
  • Suppose code will be reading a lot of strings from files, and many of them will contain the same characters. Replacing the references to the strings read from disk with references to strings stored in a map could greatly reduce the amount of storage required to hold them all, and would also expedite comparisons among them. Having X and Y be references to distinct 5,000-character strings with identical content may be semantically equivalent to having them be the same string, but X.Equals(Y) will be much faster in the latter case. – supercat Dec 26 '13 at 18:23
1

Just use the Map solution... a TreeSet and a HashSet also do it since they are backed up by a TreeMap and a HashMap, so there is no penalty in doing so (actualy it should be a minimal gain).

You may also extend your favorite Set to add the get() method.

[]]

user85421
  • 28,957
  • 10
  • 64
  • 87
1

I think your only solution, given some Set implementation, is to iterate over its elements to find one that is equals() -- then you have the actual object in the Set that matched.

K target = ...;
Set<K> set = ...;
for (K element : set) {
  if (target.equals(element)) {
    return element;
  }
}
Sean Owen
  • 66,182
  • 23
  • 141
  • 173
0

I had the same problem. I fixed it by converting my set to a Map, and then getting them from the map. I used this method:

public Map<MyObject, MyObject> convertSetToMap(Set<MyObject> set)
{
    Map<MyObject, MyObject> myObjectMap = new HashMap<MyObject, MyObject>();

    for(MyObject myObject: set){
        myObjectMap.put(myObject, myObject);
    }
    return myObjectMap
}

Now you can get items from your set by calling this method like this:

convertSetToMap(myset).get(myobject);

You can override the equals in your class to let it check on only a certain properties like Id or name.

Ben
  • 6,107
  • 6
  • 29
  • 40
0

if you have made a request for this in Java bug parade list it here and we can vote it up. I think at least the convenience class java.util.Collections that just takes a set and an object and is implemented something like

searchSet(Set ss, Object searchFor){

        Iterator it = ss.iterator();
        while(it.hasNext()){
            Object s = it.next();
            if(s != null && s.equals(searchFor)){
                return s;
            }
        }
tgkprog
  • 4,493
  • 4
  • 41
  • 70
0

This is obviously a shortcoming of the Set API.

Simply, I want to lookup an object in my Set and update its property.

And I HAVE TO loop through my (Hash)Set to get to my object... Sigh...

0

I agree that I'd like to see Set implementations provide a get() method.

As one option, in the case where your Objects implement (or can implement) java.lang.Comparable, you can use a TreeSet. Then the get() type function can be obtained by calling ceiling() or floor(), followed by a check for the result being non-null and equal to the comparison Object, such as:

TreeSet myTreeSet<MyObject> = new TreeSet();
:
:

// Equivalent of a get() and a null-check, except for the incorrect value sitting in
// returnedMyObject in the not-equal case.
MyObject returnedMyObject = myTreeSet.ceiling(comparisonMyObject);

if ((null != returnedMyObject) && returnedMyObject.equals(comparisonMyObject)) {
:
:
}
Eric_B
  • 1
0

The reason why there is no get is simple:

If you need to get the object X from the set is because you need something from X and you dont have the object.

If you do not have the object then you need some means (key) to locate it. ..its name, a number what ever. Thats what maps are for right.

map.get( "key" ) -> X!

Sets do not have keys, you need yo traverse them to get the objects.

So, why not add a handy get( X ) -> X

That makes no sense right, because you have X already, purist will say.

But now look at it as non purist, and see if you really want this:

Say I make object Y, wich matches the equals of X, so that set.get(Y)->X. Volia, then I can access the data of X that I didn have. Say for example X has a method called get flag() and I want the result of that.

Now look at this code.

Y

X = map.get( Y );

So Y.equals( x ) true!

but..

Y.flag() == X.flag() = false. ( Were not they equals ?)

So, you see, if set allowed you to get the objects like that It surely is to break the basic semantic of the equals. Later you are going to live with little clones of X all claming that they are the same when they are not.

You need a map, to store stuff and use a key to retrieve it.

Alex Vaz
  • 496
  • 5
  • 8
0

I understand that only one instance of any object according to .equals() is allowed in a Set and that you shouldn't "need to" get an object from the Set if you already have an equivalent object, but I would still like to have a .get() method that returns the actual instance of the object in the Set (or null) given an equivalent object as a parameter.

Ganesh S
  • 1
  • 2
0

The simple interface/API gives more freedom during implementation. For example if Set interface would be reduced just to single contains() method we get a set definition typical for functional programming - it is just a predicate, no objects are actually stored. It is also true for java.util.EnumSet - it contains only a bitmap for each possible value.

user158037
  • 2,659
  • 1
  • 24
  • 27
0

It's just an opinion. I believe we need to understand that we have several java class without fields/properties, i.e. only methods. In that case equals cannot be measured by comparing function, one such example is requestHandlers. See the below example of a JAX-RS application. In this context SET makes more sense then any data structure.

@ApplicationPath("/")
public class GlobalEventCollectorApplication extends Application {
    @Override
    public Set<Class<?>> getClasses() {
        Set<Class<?>> classes = new HashSet<Class<?>>();
        classes.add(EventReceiverService.class);
        classes.add(VirtualNetworkEventSerializer.class);
        return classes;
    }
}

To answer your question, if you have an shallow-employee object ( i.e. only EMPID, which is used in equals method to determine uniqueness ) , and if you want to get a deep-object by doing a lookup in set, SET is not the data-structure , as its purpose is different.

Dave
  • 962
  • 5
  • 19
  • 44
0

List is ordered data structure. So it follows the insertion order. Hence the data you put will be available at exact position the time you inserted.

List<Integer> list = new ArrayList<>();
list.add(1);
list.add(2);
list.add(3);

list.get(0); // will return value 1

Remember this as simple array.

Set is un ordered data structure. So it follows no order. The data you insert at certain position will be available any position.

Set<Integer> set = new HashSet<>();
set.add(1);
set.add(2);
set.add(3);
//assume it has get method
set.get(0); // what are you expecting this to return. 1?.. 

But it will return something else. Hence it does not make any sense to create get method in Set.

**Note****For explanation I used int type, this same is applicable for Object type also.

Kannan Msk
  • 157
  • 1
  • 6
  • 2
    You argue the non-applicability of `Set.get(int index)` (neglecting intervening `clear()`, `remove()`, [`add(int index, E element)`](https://docs.oracle.com/javase/8/docs/api/java/util/List.html#add-int-E-)) - the question was about `Set.get(Object o)`. (Downvoting mainly to keep this at zero, at most.) – greybeard Aug 24 '17 at 10:11
  • `this same is applicable for Object type also` please *do* show something else for members: without a definition/declaration, I don't know whether the `0` in `set.get(0)` is an index or a *key* - if I did, it was avoidably confusing, still. Consider `String` literals. – greybeard Aug 24 '17 at 10:49
0
Object fromSet = set.tailSet(obj).first();

if (! obj.equals(fromSet)) fromSet = null;

does what you are looking for. I don't know why java hides it.

greg
  • 159
  • 1
  • 3
0

"I want the exact object instance that is already in the set, not a possibly different object instance where .equals() returns true."

This doesn't make sense. Say you do:

Set<Foo> s = new Set<Foo>();
s.Add(new Foo(...));
...
Foo newFoo = ...;

You now do:

s.contains(newFoo)

If you want that to only be true if an object in the set is == newFoo, implement Foo's equals and hashCode with object identity. Or, if you're trying to map multiple equal objects to a canonical original, then a Map may be the right choice.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • 1
    How does it "not make sense"? `s.contains(newFoo)` should be true if `newFoo.equals(originalFoo)`, as in the definition of `.contains()`. He just wants to get `originalFoo`, not `newFoo` that he has. – user102008 Aug 31 '11 at 23:10
0

I think the expectation is that equals truely represent some equality, not simply that the two objects have the same primary key, for example. And if equals represented two really equal objects, then a get would be redundant. The use case you want suggests a Map, and perhaps a different value for the key, something that represents a primary key, rather than the whole object, and then properly implement equals and hashcode accordingly.

Yishai
  • 90,445
  • 31
  • 189
  • 263
  • not true, at least not for String, as we know. (OK, not a real difference if they have the same content) – user85421 May 14 '09 at 09:10
0

Functional Java has an implementation of a persistent Set (backed by a red/black tree) that incidentally includes a split method that seems to do kind of what you want. It returns a triplet of:

  1. The set of all elements that appear before the found object.
  2. An object of type Option that is either empty or contains the found object if it exists in the set.
  3. The set of all elements that appear after the found object.

You would do something like this:

MyElementType found = hayStack.split(needle)._2().orSome(hay);
Apocalisp
  • 34,834
  • 8
  • 106
  • 155
0

Say, I have a User POJO with ID and name. ID keeps the contract between equals and hashcode. name is not part of object equality. I want to update the name of the user based on the input from somewhere say, UI.

As java set doesn't provide get method, I need to iterate over the set in my code and update the name when I find the equal object (i.e. when ID matches).

If you had get method, this code could have been shortened.

Java now comes with all kind of stupid things like javadb and enhanced for loop, I don't understand why in this particular case they are being purist.

tapasvi
  • 322
  • 1
  • 7
  • 15
-6

I think you've answered your own question: it is redundant.

Set provides Set#contains (Object o) which provides the equivalent identity test of your desired Set#get(Object o) and returns a boolean, as would be expected.

alphazero
  • 27,094
  • 3
  • 30
  • 26
  • 1
    It is not redundant. He wants a `get()` method that returns the object inside the set, not a boolean. – user102008 Aug 31 '11 at 23:11
  • If you are the down-voter, then I suggest you review his "typically .equals() does not take into account all the properties of the object" and if that does not raise the alarm bells, then please get another throwaway account and down vote one more time. *Je mon fou* .. as to a get() -- with no args? -- that returns the object in the set, I suggest you write that api and traverse your collection with it .. ;) – alphazero Sep 01 '11 at 03:21
  • 1
    I didn't say a get() with no args. That's the standard way to indicate a function in computer world, regardless of args, in case you don't know. And no, it does not raise any alarm bells. Many objects are equal that are different. For example, an ArrayList and LinkedList are required to be equal if they contain the same elements in the same order. Maybe you haven't worked with Java long enough to know that. – user102008 Sep 01 '11 at 03:40
  • We're not discussing collection objects; rather the object content of a specific type of collection: a Set. Since you are an old timer in "computer world", I am sure you know the drill about equals() & hashcode() .. anyway, this promises to be a charming conversation. Do go on. – alphazero Sep 01 '11 at 04:50