Python code to return total count of no. of positions in which items are differing at same index

Question

A=[1,2,3,4,5,6,7,8,9] B=[1,2,3,7,4,6,5,8,9]

I have to compare these two lists and return the count of no. of location in which items are differing using one line python code.

For example: the output should be 4 for given arrays because at index (3,4,5,6) the items are differing.So, program should return 4.

My way of doing this is comparing each and every location using for loop:

count=0
for i in range(0,len(A)):
   if(A[i]==B[i]):
     continue
   else:
     count+=1
print(count)

Please help me in writing one line python code for this.

Correction: `sum(a != b for a, b in zip(A, B))` (Thanks @acw1668.) — Steven Rumbalski, Oct 18 '16 at 03:15

score 2 · Accepted Answer · edited May 23 '17 at 12:25

2

count = sum(a != b for a, b in zip(A, B))
print(count)

or just print sum(a != b for a, b in zip(A, B))

you can check about zip/lambda/map here, those tools are very powerfull and important in python..

Here you can also check others kind of ways to use those tools.

Have fun!!

edited May 23 '17 at 12:25

Community

1
1

answered Oct 18 '16 at 03:24

Lucas Batista Gabriel

890
7
13

score 1 · Answer 2 · edited May 23 '17 at 12:02

There are many ways to do this. If you're using numpy, you could just use np.count_nonzero:

>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([1, 2, 3, 7, 4, 6, 5, 8, 9])
>>> a != b
array([False, False, False,  True,  True, False,  True, False, False], dtype=bool)
>>> np.count_nonzero(a != b)
3

Note that a != b returns an array containing true and false depending upon how the condition evaluates at each index.

Here's a speed comparison:

>>> %timeit np.count_nonzero(a != b)
The slowest run took 40.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 752 ns per loop

>>> %timeit sum(i != j for i, j in zip(a, b))
The slowest run took 5.86 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 18.5 µs per loop

The caching obscures the timing, but 40.59 * 0.752 = 30.52µs, while 5.86 * 18.5 = 108.41µs, so numpy's slowest still seems significantly faster than pure python's slowest run.

This is much clearer with larger arrays:

>>> n = 10000
>>> a = np.arange(n)
>>> b = np.arange(n)
>>> k = 50
>>> ids = np.random.randint(0, n, k)
>>> a[ids] = 0
>>> ids = np.random.randint(0, n, k)
>>> b[ids] = 0
>>> %timeit np.count_nonzero(a != b)
The slowest run took 20.50 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.5 µs per loop
>>> %timeit sum(i != j for i, j in zip(a, b))
100 loops, best of 3: 15.6 ms per loop

The difference is much more stark, with numpy taking at most 235 micro-seconds, while pure python takes 15.6 milli-seconds on average!

Python code to return total count of no. of positions in which items are differing at same index

2 Answers2