24

Given an array of numbers, generate all unique pairs.

For example, given [ 1, 2, 3, 4, 5 ] the unique number pair would be:

(1, 2), (1, 3), (1, 4), (1, 5)

(2, 3), (2, 4), (2, 5)

(3, 4), (3, 5)

(4, 5)

My solution is as follows:

int[] numbers = new int[] { 1, 2, 3, 4, 5 };
HashSet<Pair> pairs = new HashSet<Pair>();

for(int i = 0; i < numbers.Length; i++)
{
    for(int j = i + 1, j < numbers.Length; j++)
    {
        pairs.Add(new Pair(numbers[i], numbers[j]));
    }
}

I think the time complexity for this looks like O(n2 - 1) subtracting 1 because iteration of j is always 1 shorter than i

Having done a bit of research into this kind of problem, I can't find any definitive answers as to whether this can be done faster. Are there any better solutions than O(n2 - 1)?

Ian Mercer
  • 38,490
  • 8
  • 97
  • 133
Matthew Layton
  • 39,871
  • 52
  • 185
  • 313
  • I realised a while back that this specific case could be solved by adapting hte Bubble Sort Algorythm. BS compares every value exactly once against every other value. And you goal is combine every value exactly once with every other value. Same algorythm, same complexity. – Christopher Apr 06 '18 at 10:50
  • 11
    As you're dealing with working code, this seems like a better fit for Code Review. – Jeroen Mostert Apr 06 '18 at 10:51
  • The only thing i can think of is with experience of many languages is a) is another language compiled up faster, b) if c# is viable, is a list the fastest, would you not to do better with maybe an array of fixed size etc. – BugFinder Apr 06 '18 at 10:52
  • please check this https://codereview.stackexchange.com/a/124188/91608 – Darshan Dave Apr 06 '18 at 10:53
  • @BugFinder those are trivial optimisations. I'm purely investigating time complexity here. – Matthew Layton Apr 06 '18 at 10:57
  • @DarshanDave that algorithm doesn't look like it's any better. They're using a `foreach` loop, and inside a condition on `.Contains`, which IIRC is O(n), so the whole thing would be something like O(n*n) _I think_ – Matthew Layton Apr 06 '18 at 10:58
  • @series0ne okay – Darshan Dave Apr 06 '18 at 11:00
  • your program will not work if the element in your array is not unique, for example [1,1,2] -> the result should be (1,1), (1,2) – Pham Trung Apr 06 '18 at 11:08
  • 1
    @PhamTrung it will now. I replaced `List` with `HashSet`. – Matthew Layton Apr 06 '18 at 11:14
  • 19
    Minor quibble: you don't write O(n^2-1). You just write O(n^2). Constant factors don't matter, scalar factors don't matter - only the fastest growing term matters. – Kevin Apr 06 '18 at 13:36
  • How many unique pairs in `[ 1, 1, 1, 1, 2 ]` ? – David Conrad Apr 06 '18 at 16:48
  • @DavidConrad 2. (1,1) and (1,2) – Matthew Layton Apr 06 '18 at 16:51
  • Seems like it's only n^2 where n is the number of unique numbers present, then, not the total length of the array. Maybe apply the hash set to the numbers first, not just to the pairs at the end? – David Conrad Apr 06 '18 at 16:56

4 Answers4

45

One of the way to think about "is there faster way to solve the problem" is to look to the size of the output for some specific format (which you consider "probably the biggest/most difficult to solve").

If the output is O(n^2), then you cannot solve the problem faster than in O(n^2), because you have to spend at least O(1) for each output.

You can see the pattern there, if you have 5 numbers in format [1, 2, 3, 4, 5], unique pairs take

4 pairs in first row
3 pairs in second row
2 pairs...
1 pair

because they look like

(1, 2), (1, 3), (1, 4), (1, 5)

(2, 3), (2, 4), (2, 5)

(3, 4), (3, 5)

(4, 5)

If you have 20 variables in array (in format [1, 2, 3,... 18, 19, 20]), it will be as following:

19 pairs
18 pairs
...
2 pairs
1 pair

Therefore the output size is (n-1) + (n-2) + (n-3) ... + 3 + 2 + 1. You have to sum it (look to how to sum the series) and the result is O(n^2)

What was proved?

That the worst case scenario is AT LEAST O(n^2).

Also note, that at this moment, we dont know real worst-case complexity - alghorithm can be even slower (we just find that some input takes O(n^2)). We know for sure that at least these data takes O(n^2). It can be faster or slower for different input.


Conlusion: We have proof, that the algorithm takes at least O(n^2) time (as worst-case scenario), you created algorithm that is running in maximum of O(n^2) time (as described in spyc post) = You have optimal algorithm.


Extra info to OP's solution: Detecting collisions with HashSet is only "pseudoConstant" and only for small numbers and "some luck". It takes O(n) for big amount of numbers. So you can end up in n^2 output and each of them takes up to n to process which leads to n^3 complexity.

You can solve it with preprocessing the task:

1) Sort it - it takes only n log n, so does not affect n^2 anyway

2) Remove numbers that repeats more than twice [1, 3, 3, 3, 5] -> [1, 3, 3, 5], it is O(n)

3)Then use your algorithm with this update:

3.1) In beginning of for i cycle: if (number[i] == number[i-1]) continue;

3.2) In beginning of for j cycle: Remember last pair. When adding new pair, look to the last pair and check if it is same or not. If so - continue;

Example:

Input: [1, 3, 3, 5]

1)i=0, j=1, number[0]=1, number[1]=3 -> add (1, 3)
2)i=0, j=2, number[0]=1, number[2]=3 -> same as last pair, use continue
3)i=0, j=3, number[0]=1, number[3]=5 -> add (1, 5)
4)i=1, j=2, number[1]=3, number[2]=3 -> add (3, 3)
5)i=1, j=3, number[1]=3, number[3]=5 -> add (3, 5)
6)i=2, before go to j-cycle, check number[i] === number[i-1] It is true, use continue
libik
  • 22,239
  • 9
  • 44
  • 87
  • I feel like this answer should have more up-votes, because it definitively says "You have optimal solution" – Matthew Layton Apr 06 '18 at 11:24
  • 1
    @series0ne - I feel the same :D, however it took me more time to write it. You can accept my answer as "the most correct one". – libik Apr 06 '18 at 11:25
  • Thanks for adding the bit about detecting collisions with `HashSet`. That's also really useful! – Matthew Layton Apr 06 '18 at 12:53
  • 1
    *"What was prov[n]? That the algorithm needs AT LEAST `O(n²)` processing."* Not 100% correct. You have proven that the complexity enumerating the pairs is in `Ω(n²)` in the worst case. (Big-O denotes an upper bound only.) – fabian Apr 06 '18 at 16:06
  • @fabian - yes, thats good point. I was thinking how to put it in answer and not confuse people as usually if someone see `Ω` he/she automatically assumes "ah, I saw that many times, it is lower bound for algorithm, so any input takes at least `n^2`" - which is not true) – libik Apr 06 '18 at 18:49
  • @libik: If you’re (rightfully!) concerned about people being confused due to commonly misused terminology, a good starting point is to use the terminology correctly yourself! – wchargin Apr 07 '18 at 05:48
  • Thanks: I do get tired of people telling me HashSets are `O(1)`... that said, sorting can also take a longer than `n log n` depending on the algorithm (e.g. Quicksort worst case `n^2`, so still doesn't change the overall complexity as you said), but some do provide this as a guarantee (I think .NET mostly uses merge sorts, which do). – VisualMelon Apr 07 '18 at 10:07
  • @wchargin - I do believe that my usage of complexity operators is still correct in that answer. We are looking for `O(...)` in worst case, so I am saying it will be at least this: `O(...)` (which is not wrong answer). Introducing `Ω` would require extra information, making the answer too long and not that relevant to the question itself – libik Apr 08 '18 at 08:25
  • @VisualMelon - Sorting can be done in real `n log n` in worst-case by several algorithms (including merge sort), so we can definitely say "sorting takes `n log n`". If bubble sort or quick sort takes `n^2` in worst-case and my very-bad-sort takes `n^3` that really does not affect anything. PS: If you are wondering why someone is using quick sort if it is `n^2`, the answer is easy - in average case it takes `n log n` (hard to prove, you have to think about all possible permutation and count the average), it has O(1) memory and low hidden constant (i.e. it does not swap numbers that much) – libik Apr 08 '18 at 08:31
  • @libik perhaps I wasn't clear: I was citing Merge Sort as a sort that is worst-case `n log n`. I appreciated your answer because it didn't assume that the hashset will perform well, and just wanted to clarify that point with regard to sorting also. – VisualMelon Apr 08 '18 at 08:52
13

It goes as follows:

first for loop - O(n)
    second for loop - O(n-1)
 

Optimal Time complexity :

enter image description here

  • Even though that one iteration is negligible, and you should calculate the time complexity for worst case scenario, which is enter image description here

You can also use binomial coefficient for permutations, to get number of permutations of a certain string. For example:

enter image description here

If you have 6 digits {0,1,2,3,4,5} (n=6), and you want to know how many different permutations you can make, i.e : (3,5) , (5,3) etc... then the (k=2, two digits in each group), the amount of permutations will be:

enter image description here different permutations, do note though that in this case (3,5) , (5,3) are counted individually, so the order of it all matters. If you want (5,3) and (3,5) to be counted as one combination then the equation goes as follows:

enter image description here


Example implementation, calculating permutations!

static long factorial(long x) // calcs the factorial TimeCmplx = O(n)
{
    if (x == 1)
        return x;
    return x * factorial(x - 1);
}

static long permutations(long n , long k) //Check that (n , k) >= 0
{            
    // Permutations , n!/(n-k)!
    return factorial(n) / factorial(n - k);
}
Community
  • 1
  • 1
Yuval
  • 180
  • 13
6

I think the time complexity for this looks like O(n2 - 1) subtracting 1 because iteraton of j is always 1 shorter than i

If it mattered (big-O-notation usually you only write the term with the fastest growth), there you have iterations of i over [0,n) each containing an iteration of j over [i+1,n) so the number of iterations is (n∙(n-1))/2 not n²-1.

Also your edit changing to HashSet rather than list changes the worst case execution, though not the amortised value - if Pair.GetHashCode() were always to return the same value, you'd have bumped it up to O(n³), as in cases where collisions are common hash set insertion becomes O(n) rather than constant.

Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
  • Sorry, could you elaborate on that last paragraph. I'm not entirely sure what you mean? – Matthew Layton Apr 06 '18 at 11:28
  • Although the amortised cost of adding n items to a List is O(n), the cost of adding an item to a hashset depends on whether the hashset has to scan for a hash collision. So if your implementation of Pair has a poor implementation of GetHashCode, every time you add something to the hash set it needs to scan through every item already in the hash set, effectively adding another nested loop into the procedure. – Pete Kirkham Apr 06 '18 at 13:19
1

This is an area of a triangle algorithm.

You have N inputs, and you want a triangle of outputs.

Your output triangle has a height of N-1 and a width of N-1.

Area of a triangle = height * width / 2
                   = (N-1)  * (N-1) / 2
                   = (N^2 - 2N + 1) / 2

O(n^2 - n) will always be the minimum/optimal algorithm cost!

Oreo
  • 529
  • 3
  • 16