5

I needed to remove the characters in string1 which are present in string2. Here string1 and string2 have only the lower case characters a-z with given condition that the length of string1 will be greater every time.

I was using the in operator:

def removeChars (string1, string2):
    for char in string2:
        if char in string1:
            string1 = string1.replace(char, '')
    return string1

But I read one answer on Stack Overflow which says:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

Which means the in operator is using a for loop behind the scenes.

So my question is, in the for loop in my code, should I consider that nested for loops are being used, as the in operator is using a for loop in the background? If yes, what would be the time complexity of this program?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • 1
    You might find [this](https://stackoverflow.com/questions/18139660/python-string-in-operator-implementation-algorithm-and-time-complexity) interesting. – Seon Aug 31 '21 at 09:48
  • 1
    Check this out: https://medium.com/analytics-vidhya/how-to-find-the-time-complexity-of-a-python-code-95b0237e0f2d – serax Aug 31 '21 at 09:52
  • 3
    "Which means 'in' operator is using for loop behind the scene." **no it doesn't**. That caveat is merely about the *semantics*. For example, a `dict` or `set` *involves no looping*, rather, it is a hash-based lookup. The point that it is making is that `in` will check for identity first as an optimization *then* equality. – juanpa.arrivillaga Aug 31 '21 at 09:53
  • 1
    Are you asking about ``in`` in general or for string specifically? – MisterMiyagi Aug 31 '21 at 09:54
  • 2
    In *this particular case*, yes, `in` between strings will involve a linear time algorithm – juanpa.arrivillaga Aug 31 '21 at 09:54
  • Just a small note. You don't have to keep the ```if``` statement in your code. You can simply run ```for char in string2: string1.replace(char,"")```. If the char is there it will be replaced. If not it will simply pass along. – glory9211 Aug 31 '21 at 09:54
  • @MisterMiyagi I have just given this string problem as an example. My doubt for 'in' operator was in general. – Ram Shankar Kumar Aug 31 '21 at 10:06
  • 1
    *More importantly*, the *semantics* of the `in` operator between two `str` objects, e.g. `s1 in s2` **is not** equivalent to `any(s1 is e or s1 == e for e in s2)`. `str` objects "sorta" act like containers, but they aren't really. – juanpa.arrivillaga Aug 31 '21 at 10:10
  • @RamShankarKumar That's a tad broad then. The `in` operator can implement [any arbitrary binary operation](https://docs.python.org/3/reference/datamodel.html#object.__contains__). It doesn't have any specific complexity by itself. – MisterMiyagi Aug 31 '21 at 10:12
  • @MisterMiyagi I think it is specifically about that quote from the documentation, though. – juanpa.arrivillaga Aug 31 '21 at 10:13
  • @juanpa.arrivillaga I'd prefer not having to guess. – MisterMiyagi Aug 31 '21 at 10:14

3 Answers3

7

in does not necessarily use loops behind the scenes. For example:

r = range(100000000000)
print(333 in r)  # prints True immediately without looping

If you were to loop r it will take quite a long time, so clearly that doesn't happen.

in basically calls (behind the scenes) the object's __contains__ method. For some iterators it will in fact "loop" through everything but that's not always the case.

This example is basically the same as calling:

r.__contains__(333)

As pointed out in comments - the str objects specifically have a smarter algorithm than just plain loops, as you can see here

Also see example answer here

And see the documentation here

Because the real world scenario would probably mean that string1 can be arbitrarily long, but the characters to be removed would be a finite and small set, it will probably be much more efficient to just add up all the characters that aren't in string2. Something like this:

def removeChars (string1, string2):
    result = ''
    for char in string1:
        if char not in string2:
            result += char
    return result

This will involve looping over string1 just once, but multiple checks against string2 using in. This can be further simplified (to avoid += loop over the result):

def removeChars (string1, string2):
    return ''.join(char for char in string1 if char not in string2)
Ofer Sadan
  • 11,391
  • 5
  • 38
  • 62
  • Also not for `str` (https://stackoverflow.com/questions/18139660/python-string-in-operator-implementation-algorithm-and-time-complexity). The complexity will be something with `O(n^2)`. `O(n^2m)` in the worst case. – Leonard Aug 31 '21 at 09:56
  • The question is pretty explicitly talking about strings, so not mentioning this case seems to miss the point. – MisterMiyagi Aug 31 '21 at 09:57
  • @MisterMiyagi you are right, I've added some small clarification about strings – Ofer Sadan Aug 31 '21 at 10:01
  • 2
    *nnooooo*. This might result in catastrophic quadratic behavior, `result += char` in a loop is going to be quadratic time, although, there is an optimization in the CPython runtime that avoids this, it is not recommended to rely on it, and you should always use a `list` and then `''.join` at the end – juanpa.arrivillaga Aug 31 '21 at 10:17
  • @juanpa.arrivillaga my last example is exactly that – Ofer Sadan Aug 31 '21 at 10:18
  • Sure, but those two are not the same, algorithmically. The latter is much less efficient than the former – juanpa.arrivillaga Aug 31 '21 at 10:19
  • 2
    As mentioned in a comment on another answer, strings already have a method to remove arbitrary many characters at once: ``str.translate`` – MisterMiyagi Aug 31 '21 at 10:20
  • @juanpa.arrivillaga It *might* [remain linear](https://stackoverflow.com/questions/44487537/why-does-naive-string-concatenation-become-quadratic-above-a-certain-length#comment113882185_44488420) if the OS helps. Ofar: (1) `333 in r` proves nothing, would finish immediately even with iteration. Do sth like `10**99 in range(10**100)` instead. (2) It won't just loop for "some" iterators but likely for most/all. Even `10**99 in iter(range(10**100))` does! (3) The smart string search algo you link to is irrelevant, since we search single characters. Iirc, it isn't even used at all then. – no comment Aug 31 '21 at 10:40
2

When in is equivalent to a loop, it has to 'walk' the iterator and compare each item in turn.

This means each loop is O(n), so for 2 levels this is O(n²)

https://wiki.python.org/moin/TimeComplexity

Note that you actually have 3 loops here - since your replace will also walk through the string.

Since replace doens't raise any errors if char isn't found, it is simpler to just do this without first testing char in string1.

match
  • 10,388
  • 3
  • 23
  • 41
  • `in` is not necessarily equivalent to a loop. It depends on the types involved – juanpa.arrivillaga Aug 31 '21 at 09:53
  • 1
    Heads up that since this is about removing many characters, using a [translation table](https://docs.python.org/3/library/stdtypes.html#str.translate) is likely the best complexity at O(n+m). – MisterMiyagi Aug 31 '21 at 09:56
  • 1
    @juanpa.arrivillaga yes - hence my saying "*When* `in` is equivalent to a loop". I was going to write a longer explanation, but that was covered by Ofer above. – match Aug 31 '21 at 09:58
0

You do have a nested for loop, but, it doesn't need to execute completely (on average), only until it reaches a match.

So best case is O(n) -- if all elements of each list match, then for each element of list 1, it takes one step of the inner loop to determine that it matches.

Worst Case is O(n^2) -- if only the very last element of list 2 matches, every iteration of the outer loop will need to test inner loop n times.

At least, that's how I understand it, but, not an expert.

Halbert
  • 142
  • 7