There are many ways to do this. If you're using numpy, you could just use np.count_nonzero
:
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([1, 2, 3, 7, 4, 6, 5, 8, 9])
>>> a != b
array([False, False, False, True, True, False, True, False, False], dtype=bool)
>>> np.count_nonzero(a != b)
3
Note that a != b
returns an array containing true and false depending upon how the condition evaluates at each index.
Here's a speed comparison:
>>> %timeit np.count_nonzero(a != b)
The slowest run took 40.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 752 ns per loop
>>> %timeit sum(i != j for i, j in zip(a, b))
The slowest run took 5.86 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 18.5 µs per loop
The caching obscures the timing, but 40.59 * 0.752 = 30.52µs
, while 5.86 * 18.5 = 108.41µs
, so numpy's slowest still seems significantly faster than pure python's slowest run.
This is much clearer with larger arrays:
>>> n = 10000
>>> a = np.arange(n)
>>> b = np.arange(n)
>>> k = 50
>>> ids = np.random.randint(0, n, k)
>>> a[ids] = 0
>>> ids = np.random.randint(0, n, k)
>>> b[ids] = 0
>>> %timeit np.count_nonzero(a != b)
The slowest run took 20.50 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.5 µs per loop
>>> %timeit sum(i != j for i, j in zip(a, b))
100 loops, best of 3: 15.6 ms per loop
The difference is much more stark, with numpy taking at most 235 micro-seconds, while pure python takes 15.6 milli-seconds on average!