156

What is the complexity of the in operator in Python? Is it theta(n)?

Is it the same as the following?

def find(L, x):
   for e in L:
       if e == x:
           return True
   return False

L is a list.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Sajad Rastegar
  • 3,014
  • 3
  • 25
  • 36
  • 6
    It depends on the type of container, since using it with a dictionary or set will be much faster than with an array. – Greg Hewgill Dec 14 '12 at 18:19
  • 1
    @BasicWolf I have used L, so it is list – Sajad Rastegar Dec 14 '12 at 18:29
  • See the comments thread at the [ActiveState snippet _List with fast membership test (__contains__)_](http://code.activestate.com/recipes/577185-list-with-fast-membership-test-__contains__/). Summary: do timing tests. – Jim DeLaHunt Dec 14 '12 at 18:30
  • 8
    @Rastegar `L` doesn't imply a list. `seq` is the most common choice where one wants to imply a list. `L` is a terrible variable name. Single letter ones are bad, and the capital implies it's a class. Even if it was something in particular, Python is dynamic, so state it explicitly in a case like this. – Gareth Latty Dec 14 '12 at 18:50
  • 4
    `L` means `list`? My libtelepathy.so is probably outdated. – Zaur Nasibov Dec 14 '12 at 18:56
  • 1
    @GarethLatty Using *lst* is also a good name to define a `list` – vmemmap Sep 24 '20 at 20:02
  • For a normal list, your algorithm should be slower - not because it is semantically less efficient, but because Python will the "in" interpret in C as one function, while in yours, every line would be interpreted separately. (Of course, there are probably some optimizations...) – Ferazhu Apr 09 '21 at 09:37

3 Answers3

225

The complexity of in depends entirely on what L is. e in L will become L.__contains__(e).

See this time complexity document for the complexity of several built-in types.

Here is the summary for in:

  • list - Average: O(n)
  • set/dict - Average: O(1), Worst: O(n)

The O(n) worst case for sets and dicts is very uncommon, but it can happen if __hash__ is implemented poorly. This only happens if everything in your set has the same hash value.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
22

It depends entirely on the type of the container. Hashing containers (dict, set) use the hash and are essentially O(1). Typical sequences (list, tuple) are implemented as you guess and are O(n). Trees would be average O(log n). And so on. Each of these types would have an appropriate __contains__ method with its big-O characteristics.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • of value is to include the overhead of generating the hash. – Woot4Moo Dec 14 '12 at 18:20
  • Hashing data types include `dict` and `set` (as wells as potentially others) – Dave Dec 14 '12 at 18:20
  • 2
    @Woot4Moo: When you're talking about asymptotic complexity, that isn't relevant. The overhead of generating the hash is constant. When you're dealing with small values of N, profiling becomes important, because, say, 100 >> 2N for small N. But that's a separate issue from what the OP was asking about; for huge N, 100 << 2N, which is what complexity is all about. – abarnert Dec 14 '12 at 19:15
  • @abarnert well it actually is quite relevant, as you don't arbitrarily choose data structures. You must consider the use and most common ways the structure will be used, so it actually is important to consider the amount of time for a hash function, especially in a scenario where the has must be computed per iteration of a program. – Woot4Moo Dec 14 '12 at 19:17
  • @Woot4Moo: If someone is asking about asymptotic complexity, either (a) they expect to deal with a large N, or (b) they're an idiot. I'm assuming the OP is case (a), but either way, the constant factor isn't relevant to the answer. – abarnert Dec 14 '12 at 20:34
  • I would expect the overhead of hashing to not be quite constant (the number of bytes being hashed would be a factor) but we can probably treat it as such so long as that doesn't correlate strongly with *n.* – kindall May 03 '22 at 14:51
-1

It depends on the container you're testing. It's usually what you'd expect - linear for ordered datastructures, constant for the unordered. Of course, there are both types (ordered or unordered) which might be backed by some variant of a tree.

Marcin
  • 48,559
  • 18
  • 128
  • 201
  • @ZoranPavlovic `A in B` tests whether `A` is in `B`. – Marcin Apr 24 '14 at 13:50
  • 1
    I'd definitely expect logarithmic time in an ordered structure. – dedObed Dec 13 '19 at 09:21
  • @dedObed Why would you expect that? Do you expect python to already know whether or not your data are sorted? – Marcin Dec 13 '19 at 19:18
  • Because if there is a container designed to be ordered, the obvious reason is to allow for logarithmic lookups. But I guess it's just a naming issue, I'd use "linear" where you wrote "ordered" and all would be fine. (In my head -- English as second language here.) – dedObed Dec 14 '19 at 10:34