2

Hi I have a python problem whereby I have to count which list elements contain the value 2, in a list with various levels of nest. E.g.:

my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]

this list can have have up to three levels of nesting but could also be two or only one level deep.

I have piece of code which sort of works:

count = 0
for i in my_list:
    if len(i) > 1:
        for x in i:
            if 2 in x:
                count += 1
    elif i == 2:
        count += 1

However, apart from being very ugly, this does not take account of the potential to have a list with a single element which is 2. It also doesn't work to get len() on a single int.

I know list comprehension should be able to take care of this but I am a litle stuck on how to deal with the potential nesting.

Any help would be much appreciated.

Darwin Tech
  • 18,449
  • 38
  • 112
  • 187
  • 1
    What's the expected output on the example you give? – NPE Dec 10 '12 at 20:13
  • Are you doing this to understand how to navigate list structures, or do you just need a count? There's already an answer using flatten that's solid. I you want to do this manually, then you'll need recursion. – RonaldBarzell Dec 10 '12 at 20:26
  • so the output would be: count = 3. – Darwin Tech Dec 10 '12 at 20:28
  • Do you ever need to handle, for example, `[2, [2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]`, and if so, what result do you need? (4?) – jimhark Dec 10 '12 at 20:59

3 Answers3

7

I'd use a variant of the flatten() generator from https://stackoverflow.com/a/2158532/367273

The original yields every elements from an arbitrarily nested and irregularly shaped structure of iterables. My variant (below) yields the innermost iterables instead of yielding the scalars.

from collections import Iterable

def flatten(l):
    for el in l:
        if isinstance(el, Iterable) and any(isinstance(subel, Iterable) for subel in el):
            for sub in flatten(el):
                yield sub
        else:
            yield el

my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print(sum(1 for el in flatten(my_list) if 2 in el))

For your example it prints 3.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Very nice solution... perhaps include the definition of flatten in your answer to make this easier to understand without needing to go to the other answer as well? – EEP Dec 10 '12 at 20:24
  • Where does `collections` come from in this case? – Darwin Tech Dec 10 '12 at 20:33
  • `import collections`. This solution counts *all* the 2's. – jimhark Dec 10 '12 at 20:38
  • This is fantastic. I tested in a more complex example and it turns up the right solution. I assumed going into this that this would be a very common design pattern. Is it not? – Darwin Tech Dec 10 '12 at 20:40
  • @DarwinTech: I think Python generators are great, and I think this is a good use case for them. – NPE Dec 10 '12 at 20:42
  • Thanks again. Definitely an area of python which deserves more of my attention. – Darwin Tech Dec 10 '12 at 20:45
0

Update and final answer:

My solution recurses a nested list of lists. If the list contains 2, the count is set to one. Then the sum of the counts of any sub-lists are added. This approach supports heterogeneous lists where numbers and lists can be mixed at the same level, as in the second usage example below:

import collections

def list_count(l):
    count = int(2 in l)
    for el in l:
        if isinstance(el, collections.Iterable):
            count += list_count(el)
    return count

Here are a couple of test cases:

my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print = list_count(my_list)
# 3 is printed

my_list = [2, [2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print = list_count(my_list)
# 4 is printed

@Akavall's answer reminded me this could be collapsed to a one-liner. (But I always get (usually justified) readability complaints on SO when I do this.)

def list_count(l):
    return ( int(2 in l) +
        sum([list_count(el) for el in l 
            if isinstance(el, collections.Iterable)]) )

Original Answer (not what original question was looking for)

Update: @NPE updated his answer after expected result was specified. He's current answer works as the original poster desires.

@NPE's (original) answer is close, but you asked:

count which list elements contain the value 2

Which I read as you needing:

my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print(sum([1 for e in [flatten(sl) for sl in ml] if 2 in e]))

But he's looking for 3. My code generates 2 because it iterates across each top level element and counts it if it contains any 2, but that's not what he actually wants.

jimhark
  • 4,938
  • 2
  • 27
  • 28
0

Here is another way to do this:

small_list = [2]
my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
another_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1], 2]

from collections import Iterable 

def count_lists_w_two(x):
    if isinstance(x, Iterable) == False:
        return 0
    else:
        return (2 in x) + sum(count_lists_w_two(ele) for ele in x)

Result:

>>> count_lists_w_two(small_list)
1
>>> count_lists_w_two(my_list)
3
>>> count_lists_w_two(another_list)
4
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • Why not instead of `any(ele == 2 for ele in x)` just use `2 in x`? – jimhark Dec 10 '12 at 21:56
  • @jimhark, I wasn't even thinking about `2 in x`. Is it better? It is shorter, of course, but I think both are pretty clear. Do you think `2 in x` has an advantage in speed? – Akavall Dec 10 '12 at 22:05
  • you're writing code to implement a built-in which usually isn't a good idea. A reader has more to read, and in this case may re-read asking, "why not use the built-in"? I fully expect your approach to be slower, but I make it a policy not guess about Python performance. I'll leave the performance testing to you. Please share any results. – jimhark Dec 10 '12 at 22:11
  • @jimhark, the build-in is about 3 times faster according my rather sloppy test, I guess enough reason for me to edit my code. Thanks for pointing that out. – Akavall Dec 10 '12 at 22:30
  • Yeah, I couldn't help but test too. After a quick test using `timeit`, it looks like the built in is up to almost 20 times faster when the list is short or the match is early in the list. For longer lists with a match near the end, it drops off to almost 4 times faster. – jimhark Dec 10 '12 at 22:30
  • Your `is_two_in` is now so simple you might think about calling the build-in `in` directly. – jimhark Dec 10 '12 at 22:32
  • Yeah, that makes sense, I also changed the name of my `count_twos` to something more understandable. – Akavall Dec 10 '12 at 22:42