5

I'm looking for a quick way to do the following: Say I have an array

X = np.array([1,1,1,2,2,2,2,2,3,3,1,1,0,0,0,5])

Instead of a simple frequency of elements I'm looking for the frequency in a row. So first 1 repeats 3 times, than 2 5 times, than 3 2 times , etc. So if freq is my function than:

Y = freq(X)
Y = np.array([[1,3],[2,5],[3,2],[1,2],[0,3],[5,1]])

For example, I can write this with loops like this:

def freq(X):
    i=0        
    Y=[]
    while i<len(X):
        el = X[i]
        el_count=0
        while X[i]==el:
            el_count +=1
            i+=1
            if i==len(X):
                break            
        Y.append(np.array([el,el_count]))

    return np.array(Y)

I'm looking for a faster and nicer way to do this. Thanks!

Yevhen Kuzmovych
  • 10,940
  • 7
  • 28
  • 48
Mike Azatov
  • 402
  • 6
  • 22
  • 1
    Use `itertools.groupby` – Dani Mesejo Oct 03 '19 at 14:29
  • Possible duplicate of [find length of sequences of identical values in a numpy array (run length encoding)](https://stackoverflow.com/questions/1066758/find-length-of-sequences-of-identical-values-in-a-numpy-array-run-length-encodi) – Randy Oct 03 '19 at 14:30

3 Answers3

4

Here's one NumPy way for performance efficiency -

In [14]: m = np.r_[True,X[:-1]!=X[1:],True]

In [21]: counts = np.diff(np.flatnonzero(m))

In [22]: unq = X[m[:-1]]

In [23]: np.c_[unq,counts]
Out[23]: 
array([[1, 3],
       [2, 5],
       [3, 2],
       [1, 2],
       [0, 3],
       [5, 1]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
3

If sorted output is OK, there is numpy.unique:

Code

X = [1,1,1,2,2,2,2,2,3,3,1,1,0,0,0,5]

import numpy as np
(uniq, freq) = (np.unique(X, return_counts=True))
print(np.column_stack((uniq,freq)))

Output

[[0 3]
 [1 5]
 [2 5]
 [3 2]
 [5 1]]
ifconfig
  • 6,242
  • 7
  • 41
  • 65
Mike-O
  • 31
  • 1
2

You can use itertools.groupby to perform the operation without invoking numpy.

import itertools

X = [1,1,1,2,2,2,2,2,3,3,1,1,0,0,0,5]

Y = [(x, len(list(y))) for x, y in itertools.groupby(X)]

print(Y)
# [(1, 3), (2, 5), (3, 2), (1, 2), (0, 3), (5, 1)]
Chris Mueller
  • 6,490
  • 5
  • 29
  • 35