0

A=[1,2,3,4,5,6,7,8,9] B=[1,2,3,7,4,6,5,8,9]

I have to compare these two lists and return the count of no. of location in which items are differing using one line python code.

For example: the output should be 4 for given arrays because at index (3,4,5,6) the items are differing.So, program should return 4.

My way of doing this is comparing each and every location using for loop:

count=0
for i in range(0,len(A)):
   if(A[i]==B[i]):
     continue
   else:
     count+=1
print(count)

Please help me in writing one line python code for this.

Nikita Gupta
  • 495
  • 9
  • 24

2 Answers2

2
count = sum(a != b for a, b in zip(A, B))
print(count)

or just print sum(a != b for a, b in zip(A, B))

you can check about zip/lambda/map here, those tools are very powerfull and important in python..

Here you can also check others kind of ways to use those tools.

Have fun!!

Community
  • 1
  • 1
1

There are many ways to do this. If you're using numpy, you could just use np.count_nonzero:

>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([1, 2, 3, 7, 4, 6, 5, 8, 9])
>>> a != b
array([False, False, False,  True,  True, False,  True, False, False], dtype=bool)
>>> np.count_nonzero(a != b)
3

Note that a != b returns an array containing true and false depending upon how the condition evaluates at each index.

Here's a speed comparison:

>>> %timeit np.count_nonzero(a != b)
The slowest run took 40.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 752 ns per loop

>>> %timeit sum(i != j for i, j in zip(a, b))
The slowest run took 5.86 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 18.5 µs per loop

The caching obscures the timing, but 40.59 * 0.752 = 30.52µs, while 5.86 * 18.5 = 108.41µs, so numpy's slowest still seems significantly faster than pure python's slowest run.

This is much clearer with larger arrays:

>>> n = 10000
>>> a = np.arange(n)
>>> b = np.arange(n)
>>> k = 50
>>> ids = np.random.randint(0, n, k)
>>> a[ids] = 0
>>> ids = np.random.randint(0, n, k)
>>> b[ids] = 0
>>> %timeit np.count_nonzero(a != b)
The slowest run took 20.50 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.5 µs per loop
>>> %timeit sum(i != j for i, j in zip(a, b))
100 loops, best of 3: 15.6 ms per loop

The difference is much more stark, with numpy taking at most 235 micro-seconds, while pure python takes 15.6 milli-seconds on average!

Community
  • 1
  • 1
Praveen
  • 6,872
  • 3
  • 43
  • 62