better way to extract item of list A which contain ID from list B?

Question

my question is about algorithms for extract data from 2 list of instances which share same ID. Here is a example: ClassA has 2 attributes: ID and info, classB has 1 attribute: ID

listA         listB
instanceA1    instanceB1
instanceA2    instanceB2
instanceA3
instanceA4

what I did is very simple

IDs_listB = [l.ID for l in listB]  
out_tab=[] 
for line in listA:
        if line.ID in IDs_listB:
            out_tab.append(line)

But this way become very slow and heavy for large data, if listA has N lines and M lines for listB, O(N*M) the total complexity is too much......

I am thinking of cython or a automate, but is there any way better to so this job?

thank you

Welcome to SO. Please provide a **[mcve]**. In this example, it's not clear what types your variables are. — jpp, Apr 09 '18 at 10:17
You probably want to use [sets](https://docs.python.org/2/library/sets.html) — gogaz, Apr 09 '18 at 10:18

score 0 · Answer 1 · answered Apr 09 '18 at 10:21

You can use set instead of ListB as set uses hashing and faster with in keyword as compared to list.

OR

Make a dictionary of ListB and use ID as key and True as value and search like this.

for line in listA: if dictB.get(line.ID): extract(line)

score 0 · Accepted Answer · answered Apr 09 '18 at 10:22

0

you can use Counter (O(n+m)) like this

from collections import Counter
lst1 = ["1","2","3"]
lst2 = ["1","3","4"]

c = Counter(lst1 + lst2)
print [i for i in c if c[i] > 1]

output

['1', '3']

answered Apr 09 '18 at 10:22

shahaf

4,750
2
29
32

score 0 · Answer 3 · answered Apr 09 '18 at 12:05

0

Using Sets you can improve your time complexity to O(n) in average case and O(m*n) in worst case where the size of lists are n, m respectively.

l1 = [1,2,3]
l2 = [2,4,6]
s = set(l1)
l3 = [i for i in l2 if i in s]
print(l3) // [2]

Average Time complexity for lookup in set is O(1) where as it reaches to O(N) in Worst Case. Referenence Complexity of *in* operator in python.

answered Apr 09 '18 at 12:05

Ishpreet

5,230
2
19
35

thank you, but in my case, my ids are not simple number , they like this : RPS4XP5,AC019181.2,NEUROD1,FRG2FP,RPL9P5. – boat Apr 12 '18 at 13:23
@boat I guess They are strings, So what's the issue you are facing? – Ishpreet Apr 12 '18 at 13:40

better way to extract item of list A which contain ID from list B?

3 Answers3