numpy - how do I count the occurrence of items in nested lists by index?

Question

Hi I want to be able to count the occurrences of items from my list by indexes of a nested list.

That is if my list is

keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
        'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']

and my nested list looks like:

[['Three' 'One' 'Ten']
 ['Three' 'Five' 'Nine']
 ['Two' 'Five' 'Three']
 ['Two' 'Three' 'Eight']
 ['One' 'Three' 'Nine']]

How many times does 'One' occur at index 0 etc for each item, is what I want to know.

I am using numpy arrays to build list and am creating output from weighted random. I want to be able to run the test over say 1000 lists and count the index occurrences to determine how the changes I make elsewhere in my program affect the end result.

I have found examples such as https://stackoverflow.com/a/10741692/461887

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

But this appears not to work with nested lists. Also been looking under indexing in the numpy cookbook - indexing and histogram & digitize in the example list but I just can't seem to find a function that could do this.

Updated to include example data output:

Assunming 100 deep nested lists

{'One': 19, 'Two': 16, 'Three': 19, 'Four': 11, 'Five': 7, 'Six': 8, 'Seven' 4, 'Eight' 3,
            'Nine' 5, 'Ten': 1, 'Eleven': 2, 'Twelve': 1, 'Thirteen': 1, 'Fourteen': 3, 'Fifteen': 0}

Or as in treddy's example

array([19, 16, 19, 11, 7, 8, 4, 3, 5, 1, 2, 1, 1, 3, 0])

Does the location of the values in the nested list matter? The `bincount` solution would work if you just flatten the array. What do you mean by "for each item"? Is 'item' one of the sublists, or is it one of the keys? — askewchan, Nov 23 '13 at 04:56
@askewchan. I would like to know how many times 'one' 'two' 'three' etc occur at index 0. — sayth, Nov 23 '13 at 05:04
@sayth it is unclear what you are asking... could you add an example about how should your output look like? — Saullo G. P. Castro, Nov 23 '13 at 07:05
@sayth I assume that desired result is dictionary, not list? — Roman Pekar, Nov 23 '13 at 10:17
@RomanPekar Yes, will edit but I am not fussy on format as long as its safe and reproducible list or dictionary is not of concern. — sayth, Nov 23 '13 at 10:20

Roman Pekar · Accepted Answer · 2013-11-23T10:22:59.110

You'd better to add example output you want to get for your example, but for now looks like collections.Counter will do the job:

>>> data = [['Three','One','Ten'],
...  ['Three','Five','Nine'],
...  ['Two','Five','Three'],
...  ['Two','Three','Eight'],
...  ['One','Three','Nine']]
... 
>>> 
>>> from collections import Counter
>>> [Counter(x) for x in data]
[Counter({'Three': 1, 'Ten': 1, 'One': 1}), Counter({'Nine': 1, 'Five': 1, 'Three': 1}), Counter({'Five': 1, 'Two': 1, 'Three': 1}), Counter({'Eight': 1, 'Two': 1, 'Three': 1}), Counter({'Nine': 1, 'Three': 1, 'One': 1})]

update:

As you gave desired output, I think the idea for you would be - fatten the list, use Counter to count occurences, and then create dictionary (or OrderedDict if order matters for you):

>>> from collections import Counter, OrderedDict
>>> c = Counter(e for l in data for e in l)
>>> c
Counter({'Three': 5, 'Two': 2, 'Nine': 2, 'Five': 2, 'One': 2, 'Ten': 1, 'Eight': 1})

or if you need only first entry in each list:

>>> c = Counter(l[0] for l in data)
>>> c
Counter({'Three': 2, 'Two': 2, 'One': 1})

simple dictionary:

>>> {x:c[x] for x in keys} 
{
    'Twelve': 0, 'Seven': 0,
    'Ten': 1, 'Fourteen': 0,
    'Nine': 2, 'Six': 0
    'Three': 5, 'Two': 2,
    'Four': 0, 'Eleven': 0,
    'Five': 2, 'Thirteen': 0,
    'Eight': 1, 'One': 2, 'Fifteen': 0
}

or OrderedDict:

>>> OrderedDict((x, c[x]) for x in keys)
OrderedDict([('One', 2), ('Two', 2), ('Three', 5), ('Four', 0), ('Five', 2), ('Six', 0), ('Seven', 0), ('Eight', 1), ('Nine', 2), ('Ten', 1), ('Eleven', 0), ('Twelve', 0), ('Thirteen', 0), ('Fourteen', 0), ('Fifteen', 0)])

And, just in case, if you don' need zeroes in your otput, you could just use Counter to get number of occurences:

>>> c['Nine']   # Key is in the Counter, returns number of occurences
2
>>> c['Four']   # Key is not in the Counter, returns 0
0

counter only counts in each nested list, I am trying to count the first entry from each nested list as a total of all nested lists. — sayth, Nov 23 '13 at 10:18
@sayth ah ok, then just change counter creation like `c = Counter(l[0] for l in data)`, other code is good — Roman Pekar, Nov 23 '13 at 10:21

score 3 · Answer 2 · answered Nov 23 '13 at 15:50

The OP asked a numpy question and collections Counter and OrderDict will certainly work, but here's a numpy answer:

In [1]: # from original posting:
In [2]: keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
...:         'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
In [3]: data = [['Three', 'One', 'Ten'],
...:            ['Three', 'Five', 'Nine'],
...:            ['Two', 'Five', 'Three'],
...:            ['Two', 'Three', 'Eight'],
...:            ['One', 'Three', 'Nine']]
In [4]: # make it numpy
In [5]: import numpy as np
In [6]: keys = np.array(keys)
In [7]: data = np.array(data)
In [8]: # if you only want counts for column 0
In [9]: counts = np.sum(keys == data[:,[0]], axis=0)
In [10]: # view it
In [11]: zip(keys, counts)
Out[11]:
[('One', 1),
('Two', 2),
('Three', 2), ...
In [12]: # if you wanted counts for all columns (newaxis here sets-up 3D broadcasting)
In [13]: counts = np.sum(keys[:,np.newaxis,np.newaxis] == data, axis=1)
In [14]: # view it (you could use zip without pandas, this is just for looks)
In [15]: import pandas as pd
In [16]: pd.DataFrame(counts, index=keys)
Out[16]:
          0  1  2
One       1  1  0
Two       2  0  0
Three     2  2  1
Four      0  0  0
Five      0  2  0 ...

+1, good one, just thought that I should mention standard python collections — Roman Pekar, Nov 24 '13 at 09:49

treddy · Answer 3 · 2013-11-23T07:43:22.850

You are correct that numpy.bincount accepts a 1D array-like object, so a nested list or array with more than 1 dimension can't be used directly, but you can simply use numpy array slicing to select the first column of your 2D array and bin count the occurrence of each digit within the range of values in that column:

keys = numpy.arange(1,16) #don't really need to use this
two_dim_array_for_counting = numpy.array([[3,1,10],\
                                      [3,5,9],\
                                      [2,5,3],\
                                      [2,3,8],\
                                      [1,3,9]])
numpy.bincount(two_dim_array_for_counting[...,0]) #only count all rows in the first column
Out[36]: array([0, 1, 2, 2]) #this output means that the digit 0 occurs 0 times, 1 occurs once, 2 occurs twice, and three occurs twice

No digits greater than 3 occur in the first column so the output array only has 4 elements counting occurrences of 0, 1, 2, 3 digits in first column.

numpy - how do I count the occurrence of items in nested lists by index?

3 Answers3

update: