-2

my question is about algorithms for extract data from 2 list of instances which share same ID. Here is a example: ClassA has 2 attributes: ID and info, classB has 1 attribute: ID

listA         listB
instanceA1    instanceB1
instanceA2    instanceB2
instanceA3
instanceA4

what I did is very simple

IDs_listB = [l.ID for l in listB]  
out_tab=[] 
for line in listA:
        if line.ID in IDs_listB:
            out_tab.append(line)

But this way become very slow and heavy for large data, if listA has N lines and M lines for listB, O(N*M) the total complexity is too much......

I am thinking of cython or a automate, but is there any way better to so this job?

thank you

boat
  • 3
  • 4

3 Answers3

0

You can use set instead of ListB as set uses hashing and faster with in keyword as compared to list.

OR

Make a dictionary of ListB and use ID as key and True as value and search like this.

for line in listA: if dictB.get(line.ID): extract(line)

0

you can use Counter (O(n+m)) like this

from collections import Counter
lst1 = ["1","2","3"]
lst2 = ["1","3","4"]

c = Counter(lst1 + lst2)
print [i for i in c if c[i] > 1]

output

['1', '3']
shahaf
  • 4,750
  • 2
  • 29
  • 32
0

Using Sets you can improve your time complexity to O(n) in average case and O(m*n) in worst case where the size of lists are n, m respectively.

l1 = [1,2,3]
l2 = [2,4,6]
s = set(l1)
l3 = [i for i in l2 if i in s]
print(l3) // [2]

Average Time complexity for lookup in set is O(1) where as it reaches to O(N) in Worst Case. Referenence Complexity of *in* operator in python.

Ishpreet
  • 5,230
  • 2
  • 19
  • 35
  • thank you, but in my case, my ids are not simple number , they like this : RPS4XP5,AC019181.2,NEUROD1,FRG2FP,RPL9P5. – boat Apr 12 '18 at 13:23
  • @boat I guess They are strings, So what's the issue you are facing? – Ishpreet Apr 12 '18 at 13:40