4

I am trying to figure out the fastest way to count how many time two values are located one after the other in a numpy list.

For example:

list = [1, 5, 4, 1, 2, 4, 6, 7, 2, 1, 3, 3, 1, 2] and I want to count the number of times the value 1 follows the value 2 (but not vice versa)

In the example above, the answer should be 1 since 1 follows 2 only once.

I can obviously reach the answer with a simple for-loop that adds to a counter every time the item i is equal 1 and item i-1 equals 2, but I feel that there must be a faster way to do it,

Thanks

Anatolii
  • 14,139
  • 4
  • 35
  • 65
Zennie
  • 375
  • 1
  • 3
  • 13
  • 3
    Note: don't use `list` as a variable name, as it is already a built in `dtype` in python – sacuL Jul 05 '18 at 14:21
  • Related (more general): [Python/NumPy first occurrence of subarray](https://stackoverflow.com/questions/7100242/python-numpy-first-occurrence-of-subarray) – miradulo Jul 05 '18 at 15:26
  • Your example is a bit unclear - do you have a NumPy array you are working with or a Python list that you think you can perform this operation faster on using NumPy? There's no such thing as "numpy lists". – miradulo Jul 05 '18 at 15:40

5 Answers5

5

You coud do this using np.diff and np.where:

import numpy as np

mylist = [1, 5, 4, 1, 2, 4, 6, 7, 2, 1, 3, 3, 1, 2]

# Turn your list into a numpy array
myarray = np.array(mylist)

# find occurences where myarray is 2 and the following element is 2 minus 1
np.sum((myarray[:-1] == 2) & (np.diff(myarray) == -1))

Which returns 1

Timings on a large array:

On a small list, the time difference between an iterative method and numpy methods will not be noticeable. But on a large array, as in the example below, the performance of numpy is much better.

import timeit

mylist = np.random.choice(range(0,9), 1000000)

def np_method(mylist = mylist):
    return np.sum((mylist[:-1] == 2) & (np.diff(mylist) == -1))

def zip_loop(a = mylist):
    return len( [1 for i,j in zip(a, a[1:]) if i == 2 and j == 1] )

def for_loop(list1 = mylist):
    count=0
    desired_num=2
    follower_num=1
    for i in range(len(list1)-1):
        if list1[i]==desired_num:
            if list1[i+1]==follower_num:
                count+=1
    return count

>>> timeit.timeit(np_method, number = 100) / 100
0.006748438189970329

>>> timeit.timeit(zip_loop, number = 100) / 100
0.3811768989200209

>>> timeit.timeit(for_loop, number = 100) / 100
0.3774999916599336
sacuL
  • 49,704
  • 8
  • 81
  • 106
1

Thea easiest way i can think of is to use a for loop

count=0
desired_num=2
follower_num=1
for i in range(len(list1)-1):
    if list1[i]==desired_num:
        if list1[i+1]==follower_num:
            count+=1
print("total occurance=",count)

takes : 0.0003437995910644531s on my machine

Inder
  • 3,711
  • 9
  • 27
  • 42
  • This would work quickly on a small array, but on my machine, this took about 0.3618 seconds for a large array (1,000,000 elements) – sacuL Jul 05 '18 at 14:36
  • lozz I have i 7 16 GB ram your specs please?? – Inder Jul 05 '18 at 14:37
  • 1
    i5 8GB ram. My machine is getting kind of old, but looping through a large array will usually take significantly longer than `numpy` vectorized methods – sacuL Jul 05 '18 at 14:39
  • @Inder nice, twice as fast as the accepted answer. upvoted. – lenik Jul 05 '18 at 15:36
0

I may suggest you to use slicing and comprehension to iterate through your input list as follows:

myList = [1, 5, 4, 1, 2, 4, 6, 7, 2, 1, 3, 3, 1, 2]

result = sum(myList[i:i+2] == [2,1] for i in range(len(myList)-1))

print(result) # 1

Using the zip() function can also help you:

myList = [1, 5, 4, 1, 2, 4, 6, 7, 2, 1, 3, 3, 1, 2]

result = sum((i,j) == (2,1) for (i,j) in zip(myList, myList[1:]))

print(result) # 1
Laurent H.
  • 6,316
  • 1
  • 18
  • 40
  • you're generating very long lists with mostly zeros inside, this is not very efficient waste of memory. see below for the method to generate smaller lists, that contain only one element for every occasion we're looking for. – lenik Jul 05 '18 at 14:36
-1

You should not call your variable list -- it's already used in python and very confusing.

>>> a = [1, 5, 4, 1, 2, 4, 6, 7, 2, 1, 3, 3, 1, 2]
>>> len( [1 for i,j in zip(a, a[1:]) if i == 2 and j == 1] )
1

Basically, you may put your array over itself with zip() and deal with the number pairs, looking for any combinations:

>>> zip(a, a[1:])
[(1, 5), (5, 4), (4, 1), (1, 2), (2, 4), (4, 6), (6, 7), (7, 2), (2, 1), (1, 3), (3, 3), (3, 1), (1, 2)]
lenik
  • 23,228
  • 4
  • 34
  • 43
-1

Just for the fun of it, I have timed all 4 main solutions, here are the results:

#!/usr/bin/env python

import numpy as np
import random

def f1(li):
    return np.sum((np.array(li[:-1]) == 2) & (np.diff(li) == -1))

def f2(li):
    return sum((i,j) == (2,1) for (i,j) in zip(li, li[1:]))

def f3(li):
    count=0
    desired_num=2
    follower_num=1
    for i in range(len(li)-1):
        if li[i]==desired_num:
        if li[i+1]==follower_num:
            count+=1
    return count    

def f4(li) :
    return len( [1 for i,j in zip(li, li[1:]) if i == 2 and j == 1] )

if __name__=='__main__':
    import timeit   
    import random
    s = []
    for i in range(10000000) :
        s.append( random.randint(1,10) )

    print f1(s), f2(s), f3(s), f4(s)

    print(f1(s)==f2(s)==f3(s)==f4(s))

    for f in (f1,f2,f3,f4):
        print("   {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("f(s)", setup="from __main__ import f, s", number=10)))

'''
output:

100236 100236 100236 100236
True
       f1    7.2285 secs
       f2    13.7680 secs
       f3    4.3167 secs
       f4    7.7375 secs

'''

Surprisingly, simple for loop beats numpy =)

lenik
  • 23,228
  • 4
  • 34
  • 43
  • @sacul 10000000 is a tiny number ? – lenik Jul 05 '18 at 15:15
  • fyi, I'm not the downvoter (on either of your answers). You can be skeptical, that's fine. Let's agree to disagree :). OP can choose whichever answer they agree with. Sorry for the frustration – sacuL Jul 05 '18 at 15:23
  • @sacul i don't care about downvoters. i hoped your solution really works as fast as you say. unfortunately it loses to a simple `for` loop and about as fast as `zip()` with list comprehensions. what a pity. – lenik Jul 05 '18 at 15:27
  • 1
    @lenik You are benchmarking the creation of a massive NumPy array from a list. This is not _at all_ productive. I agree it is unclear from their question whether they are really using NumPy or not, but benchmarking the creation of NumPy arrays doesn't have any significance. – miradulo Jul 05 '18 at 15:29
  • @miradulo i use the code from the answers verbatim. zipping large lists is not a free lunch either =) – lenik Jul 05 '18 at 15:33