0

I have array one: a1 = [1,2,3], and array two: a2 = [1,2,3]. I have to compare these two arrays and remove all similar elements from a1. At the same time I have to return a1 and I can not use additional libraries these are requirements. I'm trying to do it in such way:

    for i in a1:
        for j in a2:
            if i == j:
                a1.remove(j)

The problem here that it returns [2] instead of [] (empty array)

PS. I know about the similar question, but I couldn't find solution there How to remove items from a list while iterating?

JohnPix
  • 1,595
  • 21
  • 44

6 Answers6

2

I would be using Sets for this

>>> a1 = [1,2,3,4]
>>> a2 = [1,2,3,4]
>>> a1 = list(set(a1) - set(a2))
>>> a1
[]
>>> a1 = [1,2,3,4,5]
>>> a1 = list(set(a1) - set(a2))
>>> a1
[5]

This maybe inefficient for smaller datasets but for larger ones ive found it far more efficient (based on time).

Timed run using timeit and time. Using a method in another answer

time python -m timeit  -n 10 -s "a1 = range(100000); a2=range(100000) " -s "for i in a1:"  -s "    if i in a2: a1.remove(i)"
10 loops, best of 3: 0.0954 usec per loop

real    2m34.144s
user    2m32.468s
sys 0m0.468s

Timed run using the method above

time python -m timeit -n 10 "a1 = range(1000000); a2=range(1000000) ; a1 = list(set(a1) - set(a2)) ; "
10 loops, best of 3: 109 msec per loop

real    0m3.453s
user    0m3.090s
sys 0m0.350s
tomgalpin
  • 1,943
  • 1
  • 4
  • 12
  • That sounds quite slow. Instantiating 2 sets, along with a new list instead of filtering the exiting one directly in place sounds very inefficient to me. – Jordan Brière Jan 30 '20 at 08:22
  • Updated to reflect your input – tomgalpin Jan 30 '20 at 08:46
  • 1
    Interesting results. I'm sure the first timed code could be greatly improved by removing the duplicated lookup (once through `__contains__`, then another through `remove`) by using a `try/except` block but I doubt it can come as close as the second one. Which is most likely faster because of the internal hashing, which actually raise an issue with this solution as it would raise if any element contained into `a1` is unhashable (such as a dictionary, etc). Nonetheless, interesting results. Thanks! – Jordan Brière Jan 30 '20 at 09:11
  • 1
    @JordanBrière, thank you for making the initial point. Python has always been that language which has multiple different ways to do many tasks and people stick to those ways without taking performance into account when scaling. Additionally i hadn't thought about your point regarding hashing, ill keep that in mind, thanks. – tomgalpin Jan 30 '20 at 09:18
2

Try this:

for i in a1:
  if i in a2:
    a1.remove(i)
Kishan
  • 1,630
  • 13
  • 19
1

There's a hundred ways to solve this, but this one is my favorite:

a1 = list(filter(lambda e: e not in a2, a1))
fixmycode
  • 8,220
  • 2
  • 28
  • 43
1

You can probably do it by the following way:

a1 = [x for x in a1 if x not in a2]
J.Jai
  • 597
  • 1
  • 5
  • 9
1

The reason you would get [2] was that you were removing elements from the a while continuing iterating a which was very likely to always produce problems.

This is what we expected:

# sudo code
a[0] == b[0] ? if not then continue iterating array b; if yes, a.remove(b[0])
a[0] == b[1] ? if not then continue iterating array b; if yes, a.remove(b[1])
...

Well, but the program would not do as what we expected :(

Presumably, this has something to do with the mechanism of how for loop worked in python. I did some brief research on this but did not find an exact answer to this. However, according to the result being [2], what was likely to happen underground was the following (Disclaimer: I didn't look up the source code, and I encourage you to, so the following deduction might be false...)

Since we all know Python is from C, so Python's for loop is just a shorthand of C's. In C, normally we iterate by index, like for (int index=0; index < size; index++). In Python we seemingly have omitted the index and only produced the values, but very likely Python's for loop was still run under a for loop with index, which was just we don't see it there. The result of this mechanism was that when you executed a.remove(1) in the first iteration, you have shortened the length of a to 2, while the index was incrementing, from 0 to 1. During the second iteration, the program was looking for a[1] where 1 was the index incremented from 0. Since you have removed the first element of a, the current a[1] has become the previous a[2], which pointed to the value 3. Then, you compare the 3 in a with all the values in b again, then definitely it could find a match and would remove 3 as well. Now it has broken the condition index < size (in fact, index has been greater than size because the size has been reduced to 1 after two removes, and the index would increment from 1 to 2). So, the iteration was called a stop. Now, when you evaluated a it would produce [2], because the original a[1], the 2, was skipped over the process.

A sample code that could illustrate this is the following, which I have tried:

a = [1, 2, 3]
for n in a:
    print(n)
    a.remove(n)

You would see only 1 and 3 printed, which proved that 2 was indeed skipped.

Natarich J
  • 407
  • 2
  • 5
  • 17
1

This can be better explained by the following code :-

for i in a1:
    print('a1 = ', a1)
    print('i = ', i)
    if i in a2:
        a1.remove(i)
        print('After removing, a1 = ', a1)

The above code outputs

a1 =  [1, 2, 3]
i =  1
After removing, a1 =  [2, 3]
a1 =  [2, 3]
i =  3
After removing, a1 =  [2]

As can be seen, after the first removal of 1, now i becomes 3, and not 2. This is because Python, at the start of the loop, fixes the values of the iterator. In this case, it fixes i from 0 to 2, but since a1.remove() decreases the length of a1, it should check 2, but it is checking 1st index, i.e., 3. i = i - 1 cannot be put here as it won't work. To accomplish this, the following code can be used

i = 0
while i < len(a1):
    if a1[i] in a2:
        a1.remove(a1[i])
        i = i - 1
    i = i + 1
Swati Srivastava
  • 1,102
  • 1
  • 12
  • 18