1

I need to check if the items in one list are in another list. Both lists contain paths to files.

    list1 = [a/b/c/file1.txt, b/c/d/file2.txt]
    list2 = [a/b/c/file1.txt, b/c/d/file2.txt, d/f/g/test4.txt, d/k/test5.txt]

I tried something like:

    len1 = len(list1)
    len2 = len(list2)

    res = list(set(list2) - set(list1))
    len3 = len(res)

    if len2 - len1 == len3:
        print("List2 contains all the items in list1")

But it's not an optimal option, I have lists of 50k+ items. I think a good solution can be by creating a hash table, but I don't know exactly how I could build it. If you have any suggestions you can leave a message.

napuzba
  • 6,033
  • 3
  • 21
  • 32

2 Answers2

0

Python sets are based on hashing, hence you cannot put unhashable objects inside sets. Rather calculating lengths, directly perform set difference:

>>> list1 = ['a/b/c/file1.txt', 'b/c/d/file2.txt']
>>> list2 = ['a/b/c/file1.txt', 'b/c/d/file2.txt', 'd/f/g/test4.txt', 'd/k/test5.txt']
>>> if (set(list1) - set(list2)):  # will return empty set (Falsy) if all are contained
        print("List2 contains all the items in list1")

List2 contains all the items in list1

Here is the breakdown:

>>> difference = set(list1) - set(list2)
>>> difference
set()
>>> bool(difference)
False
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
0

I think a good solution can be by creating a hash table, but I don't know exactly how I could build it.

Sets are already implemented using hash tables, so you are already doing that.

Supposing you don't have (or don't care about) duplicates, you could try:

list1 = [1,2,3]
list2 = [1,2,3,4]
set(list1).issubset(list2)

Notice how there's no need to convert list2 to a set, see the comments on this answer.

EDIT: both your solution and mine are O(n) average, it won't get faster than that. But your solution could avoid some operations like converting the difference res into a list just to get its size.

naicolas
  • 138
  • 6