extracting item with most common probability in python list

Question

I have a list [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and I need [1,2,3,7] as final result (this is kind of reverse engineering). One logic is to check intersections -

 while(i<dlistlen):
  j=i+1
  while(j<dlistlen):
   il = dlist1[i]
   jl = dlist1[j]

   tmp = list(set(il) & set(jl))
   print tmp 

  #print i,j
   j=j+1 
  i=i+1

this is giving me output :

[1, 2]
[1, 2, 7]
[1, 2, 7]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 7]
[]

Looks like I am close to getting [1,2,3,7] as my final answer, but can't figure out how. Please note, in the very first list (([[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] )) there may be more items leading to one more final answer besides [1,2,3,4]. But as of now, I need to extract only [1,2,3,7] . Please note, this is not kind of homework, I am creating own clustering algorithm that fits my need.

Take a look to [this](http://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list). It's not intersection but frequency distribution. — Adriano Repetti, Mar 26 '13 at 11:53
I don't understand how `[1,2,3,7]` is the "item with the most common probability" — YXD, Mar 26 '13 at 11:53
Could you explain how you get from `[[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]` to `[1,2,3,7]`. — NPE, Mar 26 '13 at 11:53
@NPE realised that.. This question is strange, at least... Maybe all items that are included more than once? — ppeterka, Mar 26 '13 at 11:55
@ppeterka: I don't know about strange, but it's certainly incomplete. — NPE, Mar 26 '13 at 11:55
sorry for confusion, dlist1 = [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and dlistlen= dlistlen = len(dlist1) Hope, this clarifies How I got [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] to [1,2,3,7]. — akshayb, Mar 26 '13 at 12:02
Thanks, as I said above it is a clustering algorithm and since it is under development, I know 1,2,3,7 is the right cluster. It is kind of reverse engineering. — akshayb, Mar 26 '13 at 12:05
this works perfectly given your description: `def return1237(): return [1,2,3,7]` — Andrew Jaffe, Mar 26 '13 at 12:09

Kevin · Accepted Answer · 2013-03-26T12:07:26.560

2

You can use the Counter class to keep track of how often elements appear.

>>> from itertools import chain
>>> from collections import Counter
>>> l =  [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]

edited Mar 26 '13 at 12:07

answered Mar 26 '13 at 11:59

Kevin

74,910
12
133
166

+1, straight. [:4] is pretty arbitrary but he didn't explain which is his criterion. – Adriano Repetti Mar 26 '13 at 12:05
@Adrino, 4 is not my criteria, but now we can think of something like most frequently occurred etc. thanks. Now I will try on a larger data. – akshayb Mar 26 '13 at 12:08

Lauritz V. Thaulow · Answer 2 · 2013-03-26T12:40:50.400

1

It looks like you're trying to find the largest intersection of two list elements. This will do that:

from itertools import combinations

# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]

intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

edited Mar 26 '13 at 12:40

answered Mar 26 '13 at 12:06

Lauritz V. Thaulow

49,139
12
73
92

extracting item with most common probability in python list

2 Answers2