0

I have a list [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and I need [1,2,3,7] as final result (this is kind of reverse engineering). One logic is to check intersections -

 while(i<dlistlen):
  j=i+1
  while(j<dlistlen):
   il = dlist1[i]
   jl = dlist1[j]

   tmp = list(set(il) & set(jl))
   print tmp 

  #print i,j
   j=j+1 
  i=i+1

this is giving me output :

[1, 2]
[1, 2, 7]
[1, 2, 7]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 7]
[]

Looks like I am close to getting [1,2,3,7] as my final answer, but can't figure out how. Please note, in the very first list (([[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] )) there may be more items leading to one more final answer besides [1,2,3,4]. But as of now, I need to extract only [1,2,3,7] . Please note, this is not kind of homework, I am creating own clustering algorithm that fits my need.

akshayb
  • 1,219
  • 2
  • 18
  • 44
  • 1
    Take a look to [this](http://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list). It's not intersection but frequency distribution. – Adriano Repetti Mar 26 '13 at 11:53
  • 3
    I don't understand how `[1,2,3,7]` is the "item with the most common probability" – YXD Mar 26 '13 at 11:53
  • Could you explain how you get from `[[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]` to `[1,2,3,7]`. – NPE Mar 26 '13 at 11:53
  • @NPE realised that.. This question is strange, at least... Maybe all items that are included more than once? – ppeterka Mar 26 '13 at 11:55
  • 1
    @ppeterka: I don't know about strange, but it's certainly incomplete. – NPE Mar 26 '13 at 11:55
  • sorry for confusion, dlist1 = [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and dlistlen= dlistlen = len(dlist1) Hope, this clarifies How I got [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] to [1,2,3,7]. – akshayb Mar 26 '13 at 12:02
  • Thanks, as I said above it is a clustering algorithm and since it is under development, I know 1,2,3,7 is the right cluster. It is kind of reverse engineering. – akshayb Mar 26 '13 at 12:05
  • this works perfectly given your description: `def return1237(): return [1,2,3,7]` – Andrew Jaffe Mar 26 '13 at 12:09
  • @AndrewJaffe thanks,but looking for a generalized solution. – akshayb Mar 26 '13 at 12:11

2 Answers2

2

You can use the Counter class to keep track of how often elements appear.

>>> from itertools import chain
>>> from collections import Counter
>>> l =  [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • +1, straight. [:4] is pretty arbitrary but he didn't explain which is his criterion. – Adriano Repetti Mar 26 '13 at 12:05
  • @Adrino, 4 is not my criteria, but now we can think of something like most frequently occurred etc. thanks. Now I will try on a larger data. – akshayb Mar 26 '13 at 12:08
1

It looks like you're trying to find the largest intersection of two list elements. This will do that:

from itertools import combinations

# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]

intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)
Lauritz V. Thaulow
  • 49,139
  • 12
  • 73
  • 92