1

I have two big string list in python. I want to subtract these two list fast with the order of o(n). I found some way like remove second list elements in a loop from first list, or converting list to set() (problem:change order of list)and use minus(-) operator, but these methods are not efficient. Is there any way for doing this operation?

a=['1','2','3',...,'500000']
b=['1','2','3',...,'200000']

c=a-b

c=['200001','200002',...,'500000']
M M
  • 31
  • 7

1 Answers1

1

Your problem, as formulated, is:

  • Go through A
  • For each element, search it in B and take it if it's not found
  • No assumptions about elements is made

For arbitrary data, list search is O(N), set search is O(1), converting to set is O(N). Going through A is O(N).

So it's O(N^2) with only lists and O(N) if converting B to a set.

The only way you can speed it up is to make either iterating or searching more efficient -- which is impossible without using some additional knowledge about your data. E.g.

  • In your example, your data are sequential numbers, so you can take A[len(B):].
  • If you are going to use the same B multiple times, you can cache the set
  • You can make B a set right off the bat (if order needs to be preserved, you can use an ordered set)
  • If all data are of the same type and are short, you can use numpy arrays and its fast setdiff1d
  • etc
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • set() change the order of list, if I want to reorder to main order o(n^2) operation needed. – M M May 11 '19 at 10:28