-3

I have a bunch of lists in the form of say [0,0,1,0,1...], and I want to take the XOR of 2 lists and give the output as a list. Like: [ 0, 0, 1 ] XOR [ 0, 1, 0 ] -> [ 0, 1, 1 ]

res = []
tmp = []

for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
    for j in Course_Specific_Vocabulary_Dict['Binary Vector']:

        tmp = [i[index] ^ j[index] for index in range(len(i))]
        res.append(temp)

The size of each of my lists / vectors is around 3500 elements, so I need something to save time, since this piece of code is taking more than 20 mins to run.

I have 3085 lists, each of which need an XOR operation with 4089 other lists.

How do I do this without iterating through each list explicitly?

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 2
    What have you tried? Where is the code you used? – Error - Syntactical Remorse Jun 03 '19 at 17:36
  • 1
    _"so I need something to save time"_ - no, you don't, at least not until you've done something and seen that it was too slow. Before you try it, it is difficult to even guess the order of magnitude of the time it takes to do that (is it milliseconds, or microseconds, or seconds, or minutes?) – zvone Jun 03 '19 at 17:39
  • Ok, I am marking this as duplicate as a simple XOR python search gave me the above link. You can easily use a for loop or a list comprehension – Sheldore Jun 03 '19 at 17:42
  • No matter what implementation you select, the underlying mechanics of the implementation will iterate through both lists. You can't avoid this. This is a fundamental aspect of programming. Furthermore, 3500 elements is absolutely nothing. Even PHP, being memory inefficient due to all arrays actually being hashmaps, handles arrays with 3500 elements in near-instant time. There is no reason whatsoever for you to be concerned with runtime efficiency here. – B. Fleming Jun 03 '19 at 17:45
  • Actually I have 2 sets of lists: First Set having 3085 lists and Second Set having 4089 lists, and I need to take the XOR of each of the 3085 lists with each of the 4089 lists. Hence it is taking a lot of time ( over 20 mins ) to run the code, and I want to optimize it in some way – Ankit Billa Jun 03 '19 at 17:49
  • @AnkitBilla Than use NumPy. It is designed for stuff like that. – Error - Syntactical Remorse Jun 03 '19 at 17:58

3 Answers3

0

Assuming a and b are the same size you can use the xor operation (i.e. ^) with simple list indexing:

a = [0, 0, 1]
b = [0, 1, 1]
c = [a[index] ^ b[index] for index in range(len(a))]
print(c) # [0, 1, 0]

or you can use zip with the xor:

a = [0, 0, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip(a, b)]
print(c) # [0, 1, 0]

zip will only go to the shortest list (if they are not the same size). If they are not the same size and you want to go to the longer list you can use zip_longest:

from itertools import zip_longest
a = [0, 0, 1, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip_longest(a, b, fillvalue=0)]
print(c) # [0, 1, 0, 1]
0

Use map:

answer = list(map(operator.xor, lst1, lst2)).

or zip:

answer = [x ^ y for x,y in zip(lst1, lst2)]

If you need something faster, consider using NumPy instead of Python lists to hold your data.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • How would your code have to be changed if I had numpy arrays instead of lists? – Ankit Billa Jun 03 '19 at 17:57
  • If you have NumPy arrays, you might be able to simply write something like `answer = arr1 ^ arr2`; many binary operators are simply defined by NumPy to do pointwise operations when the arguments are both arguments. – chepner Jun 03 '19 at 18:41
  • Based on your edit, though, you don't want pointwise operations, you want to perform an XOR on each element of the *product* of your two lists, which is inherently an O(n^2) operation. NumPy won't be able to speed that up significantly. – chepner Jun 03 '19 at 18:43
0

Using numpy you should have some performance gains, the function you need is bitwise_xor, like so:

import numpy as np
results = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
    for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
        results.append(np.bitwise_xor(i, j))

A proof of concept:

a = [1,0,0,1,1]
b = [1,1,0,0,1]
x = np.bitwise_xor(a,b)
print("a\tb\tres")
for i in range(len(a)):
    print("{}\t{}\t{}".format(a[i], b[i], x[i]))

output:

a       b       x
1       1       0                                                                                                       
0       1       1                                                                                                       
0       0       0                                                                                                       
1       0       1                                                                                                       
1       1       0

Edit

Note that if your arrays have the same size, you can simply do one operation and the bitwise_xor will still work, so:

a = [[1,1,0], [0,0,1]]
b = [[0,1,0], [1,0,1]]
res = np.bitwise_xor(a, b)

will still work, and you'll have:

res: [[1, 0, 0], [1, 0, 0]]

In your case, a workaround would possibily be:

results = []
n = len(Course_Specific_Vocabulary_Dict['Binary Vector'])
for a in Employee_Specific_Vocabulary_Dict['Binary Vector']:
    # Get same size array w.r.t Course_Specific_Vocabulary_Dict["Binary Vector]
    repeated_a = np.repeat([a], n, axis=0)
    results.append(np.bitwise_xor(repeated_a, Course_Specific_Vocabulary_Dict['Binary Vector']))

However I don't know if that would actually improve performance, it is to be checked; for sure it will require some more memory.

DLM
  • 555
  • 5
  • 10
  • Hi. Upon running your code snippet, I received the following error: ValueError: operands could not be broadcast together with shapes (3085,3761) (3085,) – Ankit Billa Jun 04 '19 at 04:58
  • Which of the code snippets I wrote? Also, which is the shape of your arrays? I have just tried them and they are working. Remember that to perform such operations they need to have same length. – DLM Jun 04 '19 at 06:54