Evaluate equality among n variables, without check each pair

Question

Not sure if this is possible somehow, without doing a full comparison between each of the N variables.

for int or float, I can use simple math, to figure out if the numbers are the same; doing the sum, and dividing by the number of elements.

But if I want to check if 3 variables contain the same string, I can't use this approach. It may be possible that either var1 and var2 are the same, or that var3 and var 1 are the same, and also that var2 and var3 may be the same. From what I understand, unless you make a comparison among EACH variable, there is no way to know if the content of that variable is the same.

Did I miss something or there is an easier way?

Example with 3 variables:

string var1 = "hello";
string var2 = "there";
string var3 = "hello";

if (var1 == var2 || var1 == var3 || var2 == var3)
    //got a duplicate
else
    //all are different

Imagine to compare 20 variables, it takes forever to write a comparison statement.

To me it sound similar to a search algorithm, could I apply the same concept that I may apply to a sorted binary search?

It does not matter if the sequences are the same; although I see the flaw in the math of doing the sum and then divide by the number of elements. 5,5,5 would give 15 and divided by 3 would match the first variable (5), but this would be the same if the sequence was 5,4,6; since it would divide by 3 and give 5, as the first case, but here we have no duplicates. — , Feb 23 '16 at 09:57

Tim Schmelter · Accepted Answer · 2016-02-23T10:13:11.427

4

Put them into a collection, then your life is much easier.

A subtle but efficient approach is using a HashSet<>:

string[] collection = { var1, var2, var3 };
var set = new HashSet<string>();
bool noDuplicate = collection.All(set.Add);

HashSet.Add returns true if the item could be added which is the case if it was not already there. So this collection doesn't allow duplicates. Enumerable.All will stop enumeration on the first false. If all are unique all can be added to the set and noDuplicate will be true.

edited Feb 23 '16 at 10:13

answered Feb 23 '16 at 09:55

Tim Schmelter

450,073
74
686
939

This seems to be the best solution; it is quick, simple and not much code to write. Thanks – Feb 23 '16 at 09:58
Won't this check if only the first element has duplicates? – Dovydas Šopa Feb 23 '16 at 09:59
@DovydasSopa: of course, so i've modified my answer :) – Tim Schmelter Feb 23 '16 at 10:02
Does this work also if I create the set and assign the variables at instantiation time? Or do I have to use the .add method for each variable? – Feb 23 '16 at 10:12
@newbiez: sorry, i can't quite follow. – Tim Schmelter Feb 23 '16 at 10:13
Like you do with the string collection array; can I create a hashset of strings, and pass the varN as parameter when instantiating it? var set = new HashSet() { var1, var2, var3,...varN}; – Feb 23 '16 at 10:15
1

@newbiez: Yes, you can pass a collection to the constructor of `HashSet`. So another way would be to use `var set = new HashSet(collection); bool noDuplicate=collection.Length==set.Count;`. But i like the LINQ approach more since it stops as soon as a duplicate was found. – Tim Schmelter Feb 23 '16 at 10:17
1

Or do you ask if a `HashSet` supports [collection initializer](https://msdn.microsoft.com/en-us/library/bb384062.aspx) syntax? Yes, it does. – Tim Schmelter Feb 23 '16 at 10:19
Thanks, I am trying to create as less objects as possible; creating the set directy, instead than create an array of strings and a set, feel better space wise (in the long run of course; writing something scalable for both time and space efficiency) – Feb 23 '16 at 10:22

score 1 · Answer 2 · answered Feb 23 '16 at 10:01

1

you may try something like this:

List<string> l = new List<string>() {"a", "b", "a"};
bool isDuplicate = l.GroupBy(i => i).Any(x => x.Count() > 1);

answered Feb 23 '16 at 10:01

VDN

717
4
12

Dovydas Šopa · Answer 3 · 2016-02-23T10:03:31.903

0

Put all elements to list and just compare if normal list elements counts is same as dinstict:

string var1 = "hello";
string var2 = "there";
string var3 = "hello";
string[] values = new [] { var1, var2, var3 };
if (values.Length != values.Distinct.Count()) {
    // Got a duplicate.
} else {
    // All are different.
}

edited Feb 23 '16 at 10:03

answered Feb 23 '16 at 09:56

Dovydas Šopa

2,282
8
26
34

Thanks; although in this way there are 2 operations: one to add to the list, another to search for each duplicate. Not sure how the distinct function works internally, but it seems that this would have complexity 2n – Feb 23 '16 at 10:01
@newbiez It will be equivalent to the TimSchmelter's solution - http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,4ab583c7d8e84d6d. – Eugene Podskal Feb 23 '16 at 10:18
@Eugene Podskal It's not totally equivalent. TimSchmelter's solution will stop on first duplicate found. This solution will need to go though all the items. – Dovydas Šopa Feb 23 '16 at 10:32
1

@DovydasSopa Yes, I meant that it won't be O(n*n), but amortised O(n). Internally it uses "Hash"Set, though it will have to iterate through all the items. – Eugene Podskal Feb 23 '16 at 10:36

Tyress · Answer 4 · 2016-02-23T10:29:35.997

0

    string var1 = "hello";
    string var2 = "there";
    string var3 = "hello";
    string var4 = "hello";
    string var5 = "there";
    string var6 = "hello";
    List<string> strings = new List<string>() { var1, var2, var3,var4,var5,var6 };
    var grouped = strings.GroupBy(x => x);
    Dictionary<string,int> stringAndCount = grouped.ToDictionary(x => x.Key, x => x.Count());

The string (key) is the content of the variable, and the int value is how many variables in your list that equals that string key.

Key: hello, Value: 4
Key: there, Value: 2

EDIT:

If you just want to know if there are duplicates:

List<string> strings = new List<string>() { var1, var2, var3,var4,var5,var6 };

bool hasDupes = strings.Count != (new HashSet<string>(strings)).Count;

edited Feb 23 '16 at 10:29

answered Feb 23 '16 at 09:59

Tyress

3,573
2
22
45

Thanks. So you create a list first and then a dictionary? That may work but it is quite a lot of overhead, time wise and space wise – Feb 23 '16 at 10:11
@newbiez it *will* work, haha. This is only if you want to know how many duplicates there are. If you don't really mind, a HashSet will do it. – Tyress Feb 23 '16 at 10:13
@newbiez Also I'm not sure which part you are thinking as having overhead, but a [Dictionary is also a hash](http://stackoverflow.com/a/15042066/1685167) . Plus GroupBy implements IENumerable, and it loads lazily. I won't try to argue though since although I am concerned with more obvious overhead, C# is not like C where more simple techniques are more optimized space and time-wise, because of JIT / IL and stuff. – Tyress Feb 23 '16 at 10:22
Yep, I am not really interested in how many duplicates I have; I need to know if there are duplicates only; and act accordingly :) I am looking at the scenario where the function that I call, is called multiple times in multiple instances; which cause 2 objects to be created every time...so I am trying to get a solution that generate less objects as possible; to be more efficient (I work with big numbers, so the struggle is real) – Feb 23 '16 at 10:24
@newbiez edited my answer to something that might be more useful, although it's probably answered elsewhere in this page already (EDIT: A little more complex than Tim's answer :P) – Tyress Feb 23 '16 at 10:30

Evaluate equality among n variables, without check each pair

4 Answers4