1

I have two sorted lists and I need to find the odd ones out in each list.

Currently I am using two list comprehensions with not in:

> if foo != bar:
      in_foo = [i for i in foo if i not in bar]
      in_bar = [i for i in bar if i not in foo]

However, this method doesn't take advantage of the sorted structure of the list.

I can use an alternative method with a counter variable and recursion, but is there a more pythonic way of doing this?

edit: sorted output is preferred. Thanks.

marduk
  • 51
  • 6
  • Can you post a sample input and your expected output? – Selcuk Jun 11 '21 at 02:19
  • 1
    Every possible detail of this is covered here: Does this answer your question? [Get difference between two lists](https://stackoverflow.com/questions/3462143/get-difference-between-two-lists) – Chris Jun 11 '21 at 02:24
  • Thanks @Chris, but as far as I can tell that question covers non-sorted lists. I don't see any real discussion there about methods to take advantage of pre-sorted lists. – marduk Jun 11 '21 at 02:35
  • https://www.geeksforgeeks.org/symmetric-difference-two-sorted-array/ – Chris Jun 11 '21 at 02:51
  • Thank you @Chris. As far as I can tell, that source uses the non-pythonic counter + recursion method (implemented there as a while loop). I'm hoping for a more pythonic solution. – marduk Jun 11 '21 at 02:57
  • @marduk if you aren't interested in the most computationally efficient, then you might as well use `np.setxor1d` and call it a day, it's nearly as fast and was less typing. – Chris Jun 11 '21 at 03:05
  • Thank you @Chris. That is an interesting method I wasn't aware of, but it apparently combines the differences into a 1d array. I need to keep the unique values from each array separate. – marduk Jun 11 '21 at 03:15

3 Answers3

6

Any time you have something like this, it's frequently better to use a set and ignore the sorting (which matters significantly less in Python for small lists than for other programming languages due to the language overhead)

_foo   = set(foo)
_bar   = set(bar)
in_foo = _foo - _bar
in_bar = _bar - _foo
ti7
  • 16,375
  • 6
  • 40
  • 68
  • Note that they want to keep the sorting intact. – Selcuk Jun 11 '21 at 02:21
  • 2
    @SrikrishnaSharma How so? `set`s are unordered. – Selcuk Jun 11 '21 at 02:22
  • well.. in such a case, I would still be tempted to use a `set` and keep or discard afterwards .. however, they don't say they want a sorted output in the Question - as @Selcuk notes `set`s are unordered and so should not be assumed to have any particular order, including something like the original one! – ti7 Jun 11 '21 at 02:24
  • True, they don't say they want a sorted output in the question but the title of the question is "Contrasting two sorted lists in Python". It's a bit unclear. – Selcuk Jun 11 '21 at 02:27
  • Thank you @ti7, I considered using set difference but then I'd have to convert to set, and afterwards re-sort (I need a sorted output). I didn't know if making one step into three was more costly than efficient. – marduk Jun 11 '21 at 02:38
  • @marduk anytime! I'd use one of the dupe-marked answers then; 'should have one of everything! – ti7 Jun 11 '21 at 02:46
  • Note that using a `set` will ignore duplicates. For instance with `foo = [2, 3, 3, 4]` and `bar = [2, 3, 4]`, the set approach will ignore the extra `3` in `foo`. – Stef Jun 11 '21 at 08:22
1

Here is a comparison of three methods. The with_intersection method allows for repeated values within each list, the other two do not. The test considers two sorted lists, each with one million distinct integers.

The using_sorted method takes advantage of the fact that both lists are sorted and does not use sets. At the same time, it is the slowest, most verbose and error-prone.

import numpy as np # only for data generation

lst1 = np.random.randint(1, 20, 10**6).cumsum().tolist()
lst2 = np.random.randint(1, 20, 10**6).cumsum().tolist()

def with_intersection(lst1, lst2):
  common = set(lst1).intersection(lst2)
  res1 = [x for x in lst1 if x not in common]
  res2 = [x for x in lst2 if x not in common]
  return res1, res2

def set_then_sort(foo, bar):
  _foo   = set(foo)
  _bar   = set(bar)
  in_foo = _foo - _bar
  in_bar = _bar - _foo
  return sorted(in_foo), sorted(in_bar)

def using_sorted(lst1, lst2):
  res1 = list()
  res2 = list()
  n1 = len(lst1)
  n2 = len(lst2)
  i = j = 0
  while True:
    while i < n1 and j < n2 and lst1[i] < lst2[j]: 
      res1.append(lst1[i])
      i += 1
    while j < n2 and i < n1 and lst1[i] > lst2[j]:
      res2.append(lst2[j])
      j += 1
    while i < n1 and j < n2 and lst1[i] == lst2[j]:
      i += 1
      j += 1
    if i == n1:
      res2.extend(lst2[j:])
      break
    elif j == n2:
      res1.extend(lst1[i:])
      break
  return res1, res2
      
assert with_intersection(lst1, lst2) == set_then_sort(lst1, lst2) == using_sorted(lst1, lst2)

# %timeit with_intersection(lst1, lst2) # 306 ms
# %timeit set_then_sort(lst1, lst2)     # 491 ms
# %timeit using_sorted(lst1, lst2)      # 870 ms 
hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
0

By putting a None trailer at the end of each list, we can use iterators and focus on recording the differences. This algorithm is O(n+m):

foo = [1, 3, 4, 5]
bar = [2, 4, 5, 6]

i_foo = iter(foo + [None])
i_bar = iter(bar + [None])

n_foo = next(i_foo)
n_bar = next(i_bar)
in_foo = []
in_bar = []
while True:
    if n_foo is None:
        if n_bar is None:
            break
        in_bar.append(n_bar)
        n_bar = next(i_bar)
        continue
    if n_bar is None:
        in_foo.append(n_foo)
        n_foo = next(i_foo)
    if n_foo == n_bar:
        n_foo = next(i_foo)
        n_bar = next(i_bar)
        continue
    if n_foo < n_bar:
        in_foo.append(n_foo)
        n_foo = next(i_foo)
    else:
        in_bar.append(n_bar)
        n_bar = next(i_bar)
print(in_foo, in_bar)
# [1, 3] [2, 6]
VirtualScooter
  • 1,792
  • 3
  • 18
  • 28