Why is this python loop containing consecutive duplicate items breaking?

Question

I am attempting solve this problem. In the problem am required to iterate over a list of directions (NORTH, SOUTH, EAST, WEST) and discard any adjacent opposite directions (NORTH and SOUTH, EAST and WEST) to return a reduced array containing only non-redundant directions. When I iterate over a list that does not contain consecutive duplicates, such as ["NORTH", "SOUTH", "SOUTH", "EAST", "WEST", "NORTH", "WEST"], my code works fine, but it breaks when I iterate over a list with consecutive duplicates, like ['EAST', 'EAST', 'WEST']. Why is my code doing this, and how can I fix it to handle consecutive duplicates?

def dirReduc(arr):
  for direction in arr[:-1]:
    try:
      gotdata = arr[arr.index(direction)+1] 
    except IndexError:
      gotdata = 'null'
    except ValueError:
      gotdata = 'null'
    if gotdata is 'null':
      if arr == dirReduc(arr):
        return arr
      else:
        return dirReduc(arr)
    elif cancel_pair(direction, arr[arr.index(direction)+1]):
      del arr[arr.index(direction):arr.index(direction)+2]
  return arr

def cancel_pair(dir1, dir2):
  if dir1 in ('NORTH') and dir2 in ('SOUTH') or dir1 in ('SOUTH') and dir2 in ('NORTH'):
    return True
  elif dir1 in ('WEST') and dir2 in ('EAST') or dir1 in ('EAST') and dir2 in ('WEST'):
    return True
  return False

your non-consecutive duplicate list contains consecutive duplicates "SOUTH","SOUTH" or am i missing something — PYA, Jul 07 '17 at 19:04
Why do you consider `['EAST', 'EAST', 'WEST']` to have consecutive duplicates but not `["NORTH", "SOUTH", "SOUTH", "EAST", "WEST", "NORTH", "WEST"]`? — user94559, Jul 07 '17 at 19:04
Could you please explain what this is _supposed_ to do? it is not obvious to me. Also, your example of a "list that does not contain consecutive duplicates" has two SOUTHs in a row. Also also, please show the code for `cancel_pair`. — zwol, Jul 07 '17 at 19:04
The code is supposed to iterate over the list, then discard opposite directions (`WEST` and `EAST`, `NORTH` and `SOUTH`), then return a list with no redundant directions. It is breaking in the manner in which it iterates over the list. — d00medman, Jul 07 '17 at 19:06
What? What do you mean "discard opposite directions"? Can you explain *exactly* what you are trying to do? Your approach seems extremely convoluted, and probably inefficient (you keep using the `.index` method, for example). — juanpa.arrivillaga, Jul 07 '17 at 19:07
Is it supposed to discard opposite directions only if they appear right next to each other (so `WEST NORTH EAST` would be unmodified) or if there's any pairs at all (`WEST NORTH EAST` becomes just `NORTH`)? — zwol, Jul 07 '17 at 19:07
It breaks in that it is supposed to move on to the next `direction` in the list, but it basically gets stuck on arr[0], meaning it cannot properly pass the intended values to a helper function (which I will add) — d00medman, Jul 07 '17 at 19:07
@zwol no, it is only supposed to discard the ones which are directly adjacent to each other. so `W N E` would be left as is, but `W E N` would be reduced to `N` — d00medman, Jul 07 '17 at 19:09
@juanpa.arrivillaga I am trying to solve [this](https://www.codewars.com/kata/directions-reduction/train/python) CodeWars problem. I have it working for five of the six tests, but the sixth one breaks, largely due to the conditions I have described. TBH it's no surprise that my code is inefficient, I'm pretty new to python. If you have any advice on making my code more efficient, I would love to hear it. — d00medman, Jul 07 '17 at 19:11
so if there are opposite directions adjacent to each other they are discarded? @d00medman — PYA, Jul 07 '17 at 19:11
@pyjg yes, and they do not factor into the next comparison in the same loop (hence the `del` statement) — d00medman, Jul 07 '17 at 19:12
This will not work as you expect. `if gotdata is 'null':` Use `==` to test if two strings are equal. `is` is used to test if two objects have the same identity. — Håken Lid, Jul 07 '17 at 21:38

score 4 · Accepted Answer · answered Jul 07 '17 at 19:33

4

A for loop is not a good match for this problem. If you delete a pair of items, you may need to backtrack to see if a new pair was created. A while loop is much more natural:

opposite_directions = {
    "NORTH": "SOUTH",
    "SOUTH": "NORTH",
    "EAST": "WEST",
    "WEST": "EAST"
}

def dirReduc(arr):
    i = 0
    while i < len(arr) - 1:
        if arr[i] == opposite_directions[arr[i+1]]:
            del arr[i:i+2]
            if i > 0:
                i -= 1 # back up a step if we just deleted some moves from the middle
        else:
            i += 1
    return arr

I also replaced your cancel_pairs function with a simple dictionary lookup. Python's dictionaries are great, and they're often a better choice than a complicated if/else block.

answered Jul 07 '17 at 19:33

Blckknght

100,903
11
120
169

This worked well. I think the problem ultimately came from a flawed understanding of how the index method works. Using a while loop like this which manually tracked the index was the viable solution (it also is the solution to a broader conceptual problem I've been having, which is how to iterate over a loop in blocks without throwing errors at the end). Thank you very much! – d00medman Jul 07 '17 at 19:40
Nice insight that you only have to back up one spot after removing a pair! – user94559 Jul 07 '17 at 19:44
Yeah, while it's *often* a bad approach in Python to do iteration like you would in C (with `for i in range(len(arr))` is code smell, for instance), there's no a universal rule that using indexes is bad. Sometimes it is the most natural way to go. Note that other times there are neat iterator tricks you can use, like `for tup in zip(*[iter(arr)]*n)` to iterate over a sequence `arr` taking `n` items at a time. – Blckknght Jul 07 '17 at 19:46
I strongly prefer iterating over indices versus that particular iterator trick. – user94559 Jul 07 '17 at 19:49
You could even manage to remove entire nests of pairs in one `del` operation if you tracked 'the current move' and 'the move it matches' independently. – zwol Jul 07 '17 at 20:14
@zwol: That's hard to do. Consider `['NORTH', 'EAST', 'WEST', 'EAST', 'WEST', 'SOUTH']`, which needs to reduce to nothing. I can't see any obvious way of recognizing that the whole thing could be deleted in one go. – Blckknght Jul 07 '17 at 21:22

user94559 · Answer 2 · 2017-07-07T19:55:29.330

EDIT

I'm leaving this here because I think the explanation and the recursive version are helpful, but @Blckknght's answer is superior in that it does less work (by backing up only one element rather than restarting with each iteration).

Here's some working code. I realize it's a bit of a departure from your code. (You can ignore the cancel_pair implementation; I wrote my own before I saw yours. Yours looks fine.)

def cancel_pair(dir1, dir2):
    return tuple(sorted((dir1, dir2))) in (("NORTH", "SOUTH"), ("EAST", "WEST"))

def dirReduc(directions):
    did_something = True
    while did_something:
        did_something = False
        for i in range(len(directions) - 1):
            if cancel_pair(directions[i], directions[i + 1]):
                did_something = True
                directions = directions[:i] + directions[i + 2:]
                break
    return directions

I think there are a couple issues with your code:

You're modifying the list of directions while you're iterating over it. This is generally a bad idea, and specifically here, I think it may cause you to skip over elements (because the length of the list is changing).
You're only taking one pass through the array. In the problem you linked to on Codewars, you need to repeat the process after removing a canceling pair.
You're using index to figure out the current index, but it will always tell you the index of the first occurrence of the value you're passing in. You need to instead keep track of the index yourself. I chose to just iterate over the indices, since I need to use the index + 1 anyway.

UPDATE

A recursive approach might be easier to understand, depending on your programming background:

def dirReduc(directions):
    for i in range(len(directions) - 1):
        if cancel_pair(directions[i], directions[i + 1]):
            return dirReduc(directions[:i] + directions[i + 2:])
    return directions

No. 3 Was more or less what I had guessed. So it would make more sense to use a while loop to manually handle the stepping through the list. — d00medman, Jul 07 '17 at 19:29
I used a `for` loop to step through the list and a `while` loop repeat the process. Try running my code and printing out `directions` at the top of the `while` loop to see what happens. — user94559, Jul 07 '17 at 19:30

PYA · Answer 3 · 2017-07-07T20:26:29.700

0

opp = {}
opp['NORTH'] = 'SOUTH'
opp['EAST'] = 'WEST'
opp['SOUTH'] = 'NORTH'
opp['WEST'] = 'EAST'

def red(lst):
  i = 0
  j = 1
  while j< len(lst):
    if(opp[lst[i]] == lst[j]):
      lst[i] = 0
      lst[j] = 0
      lst = [x for x in lst if x != 0] 
      i =0
      j =1
    else:
      i+=1 
      j+=1

  print(i,j)

this is a cleaner way to do it, I store the opposite directions and then iterate over the list making a new copy every time i 'pop' opposite elements.

keep in mind that the input list will be modified by this code so you might want to pass a copy of the input list if needed.

edited Jul 07 '17 at 20:26

answered Jul 07 '17 at 19:29

PYA

8,096
3
21
38

This is a good algorithm overall, but downvoted for quadratic time complexity. You do not need to start over every time you find a match. – zwol Jul 07 '17 at 19:31
Was just trying to keep it simple for OP... didnt know this was an interview :P – PYA Jul 07 '17 at 19:31
1

@zwol You *do* need to start over every time you find a match. In the (admittedly poorly written) Codewars description of the problem, you only remove adjacent pairs, but then you repeat. So ENWSE --> ENWSE, because there are no adjacent canceling pairs, but ENWES --> E. (That takes two repetitions... ENWES --> ENS --> E.) – user94559 Jul 07 '17 at 19:34
@smarx thank you, was about to comment the same thing after further analysis. Can i be un-downvoted? – PYA Jul 07 '17 at 19:34
It's still not necessary to start over each time a pair is found, only to iterate until there are no further pairs to remove; see my answer. – zwol Jul 07 '17 at 19:36
@zwol can you please elaborate? cant understand your answer :/ – PYA Jul 07 '17 at 19:42
I think you should work it out for yourself. You will learn more that way. – zwol Jul 07 '17 at 19:43
1

@zwol Your answer has the same time complexity as this one. You have an inner loop (executed by the regular expression engine, in your case) that starts over each time at the beginning of the list (string, in your case) and then stops when there's nothing left to do. – user94559 Jul 07 '17 at 19:53
@smarx Are you saying `re.sub` _internally_ starts over from the beginning of the string after making each replacement? Because that's not what it's documented to do, and if it did that the outer loop would be unnecessary. But if it behaves as documented -- if it continues scanning _after_ the point of the match, after performing the replacement -- then my algorithm is O(words*pairs), which is the best you can do for this, not O(words^2). [I will grant that the string bashing may bring it back up to quadratic, but in a HLL you usually just live with that.] – zwol Jul 07 '17 at 19:57
1

@zwol I believe this solution is also O(words * pairs). It proceeds through the entire list until it finds a pair, and then it restarts at the beginning of the list. So there are (pairs + 1) passes through the list (like in your code). – user94559 Jul 07 '17 at 20:01
1

@zwol Your code may take fewer passes if it can find multiple pairs at once, but then again, it has to walk further through the list/string to do that. In terms of big-O notation, they're going to be the same. In terms of actual running time, it will really depend on the type of input. For some inputs, this method is faster (e.g. NNNSSS), and for some yours is faster (e.g. NSNSNS). – user94559 Jul 07 '17 at 20:05
@pyjg FYI, you should take a look at how the other answers handle removing the pair. Your zeroing out and then using list comprehension is a bit awkward. – user94559 Jul 07 '17 at 20:07
@smarx I was trying to make a new copy instead of deleting items from the list thats why I did that, but yes, it isnt the best way to do that, I agree. – PYA Jul 07 '17 at 20:09
@smarx On further reflection "O(words*pairs)" was the wrong way to put it. Both algorithms are _worst case_ O(words^2), but the thing is that this algorithm is also _average case_ O(words^2) because it has to back up to the beginning of the list _every time_ it encounters a pair, whereas the outer loop in my algorithm iterates only as many times as the _depth_ of nested pairs (plus one). – zwol Jul 07 '17 at 20:10
@zwol It can't possibly be average case O(words^2). Take the input NWSE, for example. It takes exactly one pass through the input. – user94559 Jul 07 '17 at 20:14
1

@pyjg Consider `lst = lst[:i] + lst[i+2:]` as in my recursive example. (Also, never call something `list` in Python... you're shadowing a built-in.) – user94559 Jul 07 '17 at 20:15
@smarx That's the _best_ case: nothing to do. Any correct algorithm for this problem will be best-case O(n). A fair "average" for this problem should include a number of pairs proportional to the length of the string, and a maximum depth of nested pairs logarithmically proportional to the length of the string. – zwol Jul 07 '17 at 20:16
@zwol You said it's worst-case O(words^2) and average-case O(words^2). That can only be true if it's also best-case O(words^2), which it isn't. If you want to make assumptions about the distribution of inputs, then we can determine which solution is faster. As I said a while ago, this is faster for some inputs, and yours is faster for others. I don't think the problem statement gives us any reason to assume a particular distribution of inputs. – user94559 Jul 07 '17 at 20:22
1

Of course, @Blckknght's solution seems optimal. :-) I believe it's O(n+k) where n is the length of the list and k is the number of pairs removed. – user94559 Jul 07 '17 at 20:24
@smarx You seem to be using a different definition of big-O than the one I'm used to; the assertion that worst-case quadratic + average-case quadratic implies best-case quadratic is nonsense to me. (Bubblesort is the obvious example of an algorithm that everyone agrees (I thought) to be best-case linear but average and worst case quadratic.) I already said what I thought a good definition of "average case" for this problem would be, and maybe we should just drop it at this point. – zwol Jul 07 '17 at 20:24
@zwol If I understand you properly, your complaint about this solution is that it's slower than yours for a type of input *you* decided was "average." That's simply an invalid criticism. As I keep saying, the same criticism could be leveled at your answer by just choosing a different "average" input. – user94559 Jul 07 '17 at 20:27
@zwol I think you're right about average-case complexity. That should be the average over all possible inputs... it's not immediately clear to me how to compute that for either of the solutions we're discussing. – user94559 Jul 07 '17 at 20:29
@smarx could you point me to a resource that explains why using list as a variable name is bad? I have heard it from everyone but I want to know the reason behind it.I tried to google but didnt find anything concrete. Thank you. – PYA Jul 07 '17 at 20:36
1

Python has a built-in called `list`. E.g., you might do `list(foo)` to convert `foo` to a list. But if you name a variable `list`, then you can no longer access that built-in. – user94559 Jul 07 '17 at 20:49
@zwol Oops, my example of NSNSNS isn't actually faster for your solution... it should be the same. I think ENSNSNS is a better example. (This solution has to look at the E over and over, while yours doesn't.) – user94559 Jul 07 '17 at 20:49
@zwol Also, I think I'm coming around to the idea that this might be worse in the average case (over all possible inputs), though I'm not sure how to prove it one way or the other. – user94559 Jul 07 '17 at 20:50

zwol · Answer 4 · 2017-07-07T19:35:51.137

There are a lot of mistakes in this function. The most important two local mistakes are:

arr.index(direction) will always find the first instance of direction in arr, even after you've gone on to the next one
you have del arr[x:y] in the middle of a loop over arr, which will have unpredictable results.¹

But also, I don't think your algorithm does what it's supposed to do. I would write this like this:

import re
_rcd_removals = re.compile(
    r"\b(?:SOUTH NORTH|NORTH SOUTH|WEST EAST|EAST WEST)\b")

def remove_cancelling_directions(dirs):
    """DIRS is a list of uppercase cardinal directions as strings
       (NORTH, SOUTH, EAST, WEST).  Remove pairs of elements that
       cancel each others' motion, e.g. NORTH immediately followed
       by SOUTH.  Modifies DIRS in place and returns nothing."""

    dirs[:] = _rcd_removals.sub("", " ".join(dirs)).split()

Regexes are great at sliding-window edits to sequences of words. Working on this as an array is going to be a lot finickier.

If you need to collapse pairs that are only visible after other pairs have been collapsed (e.g. WEST SOUTH NORTH EAST should become the empty list), then you should iterate to a fixed point:

import re
_rcd_removals = re.compile(
    r"\b(?:SOUTH NORTH|NORTH SOUTH|WEST EAST|EAST WEST)\b")
_rcd_fixwhite = re.compile(" {2,}")

def remove_cancelling_directions_repeatedly(dirs):
    ds = " ".join(dirs)
    prev_ds = None
    while prev_ds != ds:
        prev_ds = ds
        ds = _rcd_removals.sub("", ds)
        ds = _rcd_fixwhite.sub(" ", ds)
   dirs[:] = ds.split()

¹ Can anyone point me at official Python documentation stating that the effect of mutating a list while iterating over it is unpredictable? I know that it is, but I can't find an official rule. (The spec for list goes out of its way to point out that mutating a list while sorting it has undefined behavior.)

The input isn't a string; it's a list. Converting to a string to use a regular expression seems like a bad idea for two reasons: the unnecessary conversion and the use of regular expressions for no particular gain. — user94559, Jul 07 '17 at 19:50
Using regexes means no finicky index calculations. That's sufficient reason all by itself in my book. — zwol, Jul 07 '17 at 20:01

Why is this python loop containing consecutive duplicate items breaking?

4 Answers4