1

I want to Find whether an array is subset of another array or not and one of the method i can think of doing it is using Hashtable but i want to implement it in python. Attached in the thread is the c++ implementation. I'm not looking for built in functions here like set etc..

Python only has concept of dictionary in terms of hashtables but not sure how to proceed from here. Any suggestions would help me solve it.

Below are couple of lists:

arr1[] = [11, 1, 13, 21, 3, 7]

arr2[] = [11, 3, 7, 1]

Method (c++ Use Hashing)

1) Create a Hash Table for all the elements of arr1[].

2) Traverse arr2[] and search for each element of arr2[] in the Hash Table. If element is not found then return 0.

3) If all elements are found then return 1.

Lists can be million of numbers as well so a scalable and efficiet solution is expected.

user7422128
  • 902
  • 4
  • 17
  • 41

3 Answers3

0

I think you want set e.g. set(arr2).issubset(arr1)

Mark Borgerding
  • 8,117
  • 4
  • 30
  • 51
0

Try this:

i = 0
allIn = True
while i <= len(arr2) and allIn:
    if arr2[i] not in arr1:
        allIn = False
    i += 1

allIn will say whether the second list is in the first.

Note: The other solution using set() works equally well.

EDIT (In response to comments):

I'm not using a for loop as I don't know how to stop the loop from running once allIn is False (I don't know whether using break would work so I'm staying on the safe side).

I'm not using set() as the OP explicitly stated that they don't want to use in-built functions. I have posted my answer as an alternate solution to those answers already provided (but have also commended those as I believe they are better).

Adi219
  • 4,712
  • 2
  • 20
  • 43
  • I'm not going to downvote, but there's a couple of things wrong with this solution. First, the OP clearly cares about efficiency, and was looking for a hash-table based solution. Your current solution is actually O(N * M) because you are using a very inefficient `in` operation between lists. A hash based solution will be O(N). Finally, you should *always* use for-loops when you can over while-loops in python. – juanpa.arrivillaga Mar 06 '18 at 19:22
  • I don't this will give me an efficient solution as in case of hashtables which solves in O(1) time mostly – user7422128 Mar 06 '18 at 19:23
0

In Python, you would use set objects for this:

>>> arr1 =  [11, 1, 13, 21, 3, 7]
>>> arr2 = [11, 3, 7, 1]
>>> set(arr1).issuperset(arr2)
True

Or more efficiently, use:

>>> set(arr2).issubset(arr1)
True

If you expect arr2 to be much smaller...

Some quick timings, it seems that they are about the same in rumtime, although, creating a set from arr1 will require much more auxiliary memory:

>>> import numpy as np
>>> arr1 = np.random.randint(0, 100, (1000000,)).tolist()
>>> len(arr1)
1000000
>>> from timeit import timeit
>>> arr2 = [11, 3, 7, 1]
>>> timeit('set(arr1).issuperset(arr2)', 'from __main__ import arr1, arr2', number=1000)
14.337173405918293
>>> timeit('set(arr2).issubset(arr1)', 'from __main__ import arr1, arr2', number=1000)
14.459818648989312
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172