3

I have two list like this:

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]

I have another queryList like this:

queryList = ["abc","cccc","abc","yyy"]

queryList & listt[0] contain 2 "abc" in common.

queryList & listt[1] contain 1 "abc", 1 "cccc" & 1 "yyy" in common.

So I want an output like this:

[2,3] #2 = Total common items between queryList & listt[0]
      #3 = Total common items between queryList & listt[1]

I am currently using loops to do this, but this seems to be slow. I will have millions of lists, with thousands of items per list.

listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]

totalMatch = []
for hashtree in listt:
    matches = 0
    tempQueryHash = queryList.copy()
    for hash in hashtree:
        for i in range(len(tempQueryHash)):
            if tempQueryHash[i]==hash:
                matches +=1
                tempQueryHash[i] = "" #Don't Match the same block twice.
                break

    totalMatch.append(matches)
print(totalMatch)
David Buck
  • 3,752
  • 35
  • 31
  • 35
Rahul
  • 137
  • 2
  • 11
  • But `listt[0]` contains 3 `abc`. Should the output then be `[3, 3]`? – mkrieger1 Apr 10 '20 at 14:25
  • _millions of lists, with thousands of items per list_ If you have that much data, you might want to use something other than lists. Lists must use sequential access to check if an item is present, which is very slow. sets and dicts use hashing which is much faster. – John Gordon Apr 10 '20 at 14:28
  • That will be several gigabytes of data, so besides lists, you might also want to use something other than Python... – Thomas Apr 10 '20 at 14:30
  • @mkrieger1 BUt query list only contains two "abc".So only two matches – Rahul Apr 10 '20 at 14:31
  • Which format you recommand? @JohnGordon dicts? – Rahul Apr 10 '20 at 14:32
  • @Thomas My data is in MySQL....So I think maybe, I should find SQL solution, right? Damn...Dumb me – Rahul Apr 10 '20 at 14:33
  • Does this answer your question? [Intersection of two lists including duplicates?](https://stackoverflow.com/questions/37645053/intersection-of-two-lists-including-duplicates) – mkrieger1 Apr 10 '20 at 14:34
  • Well yes! A database is designed to solve such problems efficiently. – mkrieger1 Apr 10 '20 at 14:35

3 Answers3

2

Well, I'm still learning the ropes within Python. But according to this older post on so, something like the following should work:

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
OutputList = [len(list((Counter(x) & Counter(queryList)).elements())) for x in listt]
# [2, 3]

I'll keep a lookout for some other method...

JvdV
  • 70,606
  • 8
  • 39
  • 70
2

Improvement from JvdV answer.

Basically sum the values instead of counting the elements and also cache the queryListCounter.

from collections import Counter
listt = [["a","abc","zzz","xxx","abc","abc"],["yyy","ggg","abc","cccc"]]
queryList = ["abc","cccc","abc","yyy"]
queryListCounter = Counter(queryList)
OutputList = [sum((Counter(x) & queryListCounter).values()) for x in listt]
Yosua
  • 411
  • 3
  • 7
0

You can list the matches of listt and queryList and count the number of matches made.

output = ([i == z for i in listt[1] for z in queryList])
print(output.count(True))
zebsy
  • 13
  • 4