2

I am trying to find an algorithm that returns the number of pairs of duplicates in a list.

Example: Input: [13,4,8,4,13,7,13,9,13] Output: 7 (4 13's comes out to 6 pairs and two 4's comes out to 1 pair )

Can my algorithm become more efficient? I would like it to be faster than Theta(n^2)

Here is what I have:

my_List=[13,3,8,3,13,7,13,9,13]

pairs=0
alreadySeen=[]

for element in my_List:
  howMany=0
  if element in alreadySeen:
    False
   else:
    howMany=my_List.count(element)
    pairs=pairs+((howMany*(howMany-1))/2)
    howMany=0
    alreadySeen.append(element)

print(pairs)
JamalCrawford..
  • 177
  • 2
  • 12
  • 1
    how can `4 13's comes out to 6 pairs` , why not `2 pairs` ? – Mitesh Pant Feb 02 '17 at 05:01
  • I do not know `python`, I can write this is `java/php/js/perl/c/c++` – Mitesh Pant Feb 02 '17 at 05:02
  • 4 13's will make 6 different unique pairs. (13.1,13.2) (13.1,13.3) (13.1,13.4) (13.2, 13.3) (13.2, 13.4) (13.3, 13.4) – JamalCrawford.. Feb 02 '17 at 05:09
  • pseudocode is fine as well, its the efficiency I am looking for. – JamalCrawford.. Feb 02 '17 at 05:10
  • Two options come to mind: 1. Use a while loop instead of iterating on elements, then remove the already seen elements after they're counted so you operate on a smaller list. 2. Sort the list, and then count only consecutives. Find a simple equation to go from for example 4 identical numbers to yielding 6 pairs. – Brian Feb 02 '17 at 05:12
  • 2is1, 3is3, 4is6, 5is10, 6is15,... so pairs=sum (range (1,N)) – Brian Feb 02 '17 at 05:17

5 Answers5

3

Here is an algorithm that runs in O(N).

  1. Iterate over the elements once to create a dict of each element and its count. The output of this step for your example is {13: 4, 4:2, 8:1, ...}
  2. Iterate over that dict and calculate the number of pairs for each element. The number of pairs for each element can be thought of as selecting 2 items from a list of N elements. This could be done by calculating the combinations without repetitions using the formula (N * (N-1)) / 2. So for 4 elements, there are (4 * 3) / 2 = 6 pairs.
Hesham Attia
  • 979
  • 8
  • 13
  • Why isn't O(N^2) ? You say to iterate over all the elements one time for the dict and then again for the calculations? – JamalCrawford.. Feb 02 '17 at 05:25
  • 3
    @JamalCrawford.. doing something twice makes it O(2N) which is just O(N) because constant values are ignored in Big O notation: http://stackoverflow.com/questions/22188851/why-is-constant-always-dropped-from-big-o-analysis – samgak Feb 02 '17 at 05:31
2

@Hesham Attia already provided the correct algorithm, here's simple implementation with Counter:

>>> from collections import Counter
>>> l = [13,4,8,4,13,7,13,9,13]
>>> sum(x * (x - 1) // 2 for x in Counter(l).values())
7
niemmi
  • 17,113
  • 7
  • 35
  • 42
0

Here is a javascript code, you can convert this to phython code, the complexity is linear ~ O(n)

var arr = [13,3,8,3,13,7,13,9,13];
var count = {};
var totalpairs =0;
for(var i=0;i<arr.length;i++){
  if(count[arr[i]]){
    count[arr[i]]++;
  }else{
    count[arr[i]] =1;
  }
}

for(key in count){
  if(count[key] %2 == 0){
    totalpairs = totalpairs + count[key]/2;
  }
}

console.log(' total pairs are '+ totalpairs);
Mitesh Pant
  • 524
  • 2
  • 15
0

Here is a simple and efficient way of finding all possible duplicate pairs in a list with time complexity ~ O(N).

l = [13,3,8,3,13,7,13,9,13]
# Two pairs of 13 and One pair of 3
# Sum equals to Three
alreadySeen = []
total_no_of_pairs = 0
for i in range(len(l)):
    if l[i] not in alreadySeen:
        alreadySeen.append(l[i])
    else:
        # If element l[i] is present in alreadySeen list
        # Indicates a Pair and increments count
        # Remove element for creating a new pair
        total_no_of_pairs +=1
        alreadySeen.remove(l[i])

print(total_no_of_pairs)

Output:

3
gkedia
  • 1
  • 1
0

Time Complexity : O(N)

arr = list(map(int,input().split()))
    d = {}
    for i in range(len(arr)):
        if arr[i] in d.keys():
            d[arr[i]] += 1
        else:
            d[arr[i]] = 1
    ans = 0
    for val in d.values():
        if val > 1:
            ans += val*(val-1)//2
    print(ans)
cool
  • 327
  • 1
  • 6
  • 14