19

I have been asked a question that is a little ambiguous for my coursework.

The array of strings is regarded as a set, i.e. unordered.

I'm not sure whether I need to remove duplicates from this array?

I've tried googling but one place will tell me something different to the next. Any help would be appreciated.

dev6546
  • 1,412
  • 7
  • 21
  • 40
  • 2
    What are the references that you have googled? I'd love to see those that say that set elements can be duplicate? – Edwin Dalorzo Apr 04 '12 at 13:00
  • Set could not have duplicate element same as like in hash or dictionary key, because set implementation is almost similar to hash with dummy value. – James Sapam Dec 23 '13 at 12:48

4 Answers4

36

From Wikipedia in Set (Mathematics)

A set is a collection of well defined and distinct objects.

Perhaps the confusion derives from the fact that a set does not depend on the way its elements are displayed. A set remains the same if its elements are allegedly repeated or rearranged.

As such, the programming languages I know would not put an element into a set if the element already belongs to it, or they would replace it if it already exists, but would never allow a duplication.

Programming Language Examples

Let me offer a few examples in different programming languages.

In Python

A set in Python is defined as "an unordered collection of unique elements". And if you declare a set like a = {1,2,2,3,4} it will only add 2 once to the set.

If you do print(a) the output will be {1,2,3,4}.

Haskell

In Haskell the insert operation of sets is defined as: "[...] if the set already contains an element equal to the given value, it is replaced with the new value."

As such, if you do this: let a = fromList([1,2,2,3,4]), if you print a to the main ouput it would render [1,2,3,4].

Java

In Java sets are defined as: "a collection that contains no duplicate elements.". Its add operation is defined as: "adds the specified element to this set if it is not already present [...] If this set already contains the element, the call leaves the set unchanged".

Set<Integer> myInts = new HashSet<>(asList(1,2,2,3,4));
System.out.println(myInts);

This code, as in the other examples, would ouput [1,2,3,4].

Edwin Dalorzo
  • 76,803
  • 25
  • 144
  • 205
  • Ah right thank you, so it doesn't matter if I remove them or change the order? – dev6546 Apr 04 '12 at 13:16
  • 1
    My point is that there is not a mathematical property of sets that consists in determining how many times an element belongs to a set or not. If you had A={1,2,2,3,4}, you can ask if 2 ∈ A and the answer is yes, regardless of how many times it appeared in the set. – Edwin Dalorzo Apr 04 '12 at 13:21
  • @Lewis: The point is that a set doesn't imply ANYTHING about ordering whatsoever, so it shouldn't even be a question. Sure, it could be that your implementation of a set keeps everything in the order of insertion, but that is not defined in the definition of a set. A valid implementation of a set could be something which simply adds any items you want to add, but when you ask it which items it has it only returns every value one (i.e. distinct values). – Erik van Brakel Apr 04 '12 at 15:39
  • So can a database table be considered a set of rows, since typically each row, taken as a whole, is unique? – Michael Jan 31 '15 at 06:44
  • 1
    @Michael I think that a database _row_ corresponds with the mathematical concept of a [tuple](http://en.m.wikipedia.org/wiki/Tuple) and a _table_ is a [relation](http://en.m.wikipedia.org/wiki/Relation_(mathematics)) which is a set of tuples and so the _database_ could be considered a set of relations. – Edwin Dalorzo Jan 31 '15 at 14:24
  • @Michael I forgot to mention that a database table could have repeated rows, nothing prevents that from happening (e.g. no primary/candidate keys, a view), but the mathematical relation, since it is based on the concept of a set, would consider that given tuple only once, since a set contains only unique elements. At least this is my understanding on the subject. So, like in the examples of implementations of sets in programming languages, there are subtle differences. – Edwin Dalorzo Jan 31 '15 at 15:43
8

A set cannot have duplicate elements by its mere definition. The correct structure to allow duplicate elements is Multiset or Bag:

In mathematics, a multiset (or bag) is a generalization of the concept of a set that, unlike a set, allows multiple instances of the multiset's elements. For example, {a, a, b} and {a, b} are different multisets although they are the same set. However, order does not matter, so {a, a, b} and {a, b, a} are the same multiset.

A very common and useful example of a Multiset in programming is the collection of values of an object:

values({a: 1, b: 1}) //=>  Multiset(1,1)

The values here are unordered, yet cannot be reduced to Set(1) that would e.g. break the iteration over the object values.

Further, quoting from the linked Wikipedia article (see there for the references):

Multisets have become an important tool in databases.[18][19][20] For instance, multisets are often used to implement relations in database systems. Multisets also play an important role in computer science.

Dmitri Zaitsev
  • 13,548
  • 11
  • 76
  • 110
  • 1
    It's also very relevant to psychology and AI programming, as comparing sets allows you to determine whether two things are the same, or less or more specific. In natural language processing this allows the distinction of for example synonyms, where the elements of a set consist of the various parts of a definition that make up the word itself. If all the elements are identical, they must be the same word because they claim the same concept. – G_V Jul 24 '18 at 10:16
  • @G_V Thanks for your comment. – Dmitri Zaitsev Jul 28 '18 at 15:32
6

Let A={1,2,2,3,4,5,6,7,...} and B={1,2,3,4,5,6,7,...} then any element in A is in B and any element in B is in A ==> A contains B and B contains A ==> A=B. So of course sets can have duplicate elements, it's just that the one with duplicate elements would end up being exactly the same as the one without duplicate elements.

vercetti
  • 124
  • 1
  • 1
  • 7
    You are confusing set declaration with the set data structure. The declaration can have duplicates but not the structure itself. – Dmitri Zaitsev Apr 21 '17 at 04:31
  • 2
    This is an answer? Are you serious? This will confuse anyone studying Science and Mathematics in particular. – user1935724 Nov 09 '20 at 19:25
  • I disagree with the previous commenters. This answer is the best. Both A={1,1} and B={1} are valid sentences of set theory. Thus, one must explain via the axioms of set theory why A=B, and that requires invoke the Axiom of extensionality: $\forall x \forall y [\forall z(z\in x \Leftrightarrow z\in y)\implies x=y]$. Stating that one "cannot write down A" because it violates the definition is false, it is a valid sentence. – Anon21 May 23 '21 at 12:14
2

"Sets are Iterables that contain no duplicate elements." https://docs.scala-lang.org/overviews/collections/sets.html

Jay
  • 51
  • 2
  • Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – double-beep Jan 30 '19 at 20:35
  • 1
    @double-beep The essential part *is* included. Minimal answer, but it is an answer. – Zoe Feb 02 '19 at 17:18
  • @Zoe my thought exactly :) – Jay Feb 04 '19 at 03:52