2

I know that str.count(sub) return the number of occurrences of substring sub. But I faced a strange problem that I cannot understand.
My code is as below:

str = 'helloworld'
print(str.count(''))
>>>11

The output is puzzling. Why does it return 11? If '' means any letters, why not return the length of it (10)? I haven't found any answers about this. Can anybody tell me about the implementation?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Yixin
  • 47
  • 4
  • 3
    `''` doesn't mean any letters, it means an empty string. There are 11 empty strings in `'helloworld'`, one at the start, nine between each pair of letters and one at the end. If you want the length, use `len('helloworld')`. – jonrsharpe Apr 22 '18 at 16:00
  • 1
    A string starts with an empty string and also ends with one – so always one more than the number of characters. – Jongware Apr 22 '18 at 16:01
  • Try looking at str.count.__doc__ – Scott Hunter Apr 22 '18 at 16:02
  • 2
    Think of `"foo"` as being equal to `"" + "f" + "" + "o" + "" + "o" + ""`, which is in some sense the minimal number of empty strings to be found contained in the string `"foo"`. – chepner Apr 22 '18 at 16:15

1 Answers1

6

'' doesn't mean any string, it means no string (aka, the empty string, or the 0-length string). There are, strictly speaking, an infinite number of 0-length strings in a string, but practically, len(string) + 1 is returned - one for just before first character, and one each for after every character.

This scenario has been explicitly special-cased in count.h:

if (sub_len == 0)
    return (str_len < maxcount) ? str_len + 1 : maxcount;

When the search string is the empty string, len(string) + 1 is returned by default.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks a lot, @COLDSPEED, your explanation is really helpful. – Yixin Apr 23 '18 at 00:56
  • @COLDSPEED Sorry to bother you again. When I looked into the implementation you mentioned above (count.h), there is one line I cannot understand (Line 14). if (str_len < 0) return 0;, why can str_len be less than 0? – Yixin Apr 23 '18 at 07:31
  • @Yixin Without too much knowledge of python internals, it looks like that handles another invalid case when you call str.count with a "start" parameter (the index to begin searching from), and that start index is larger than the string's length. By default, start = 0. – cs95 Apr 23 '18 at 07:59
  • You means the variable str_len here is actually not the length of original string? Maybe a value calculated by “start” parameter. – Yixin Apr 23 '18 at 08:48
  • @Yixin I imagine it is calculated by whoever calls it. `str_len = len(str) - start`, and it is possible to go below 0 when start > len(str). – cs95 Apr 23 '18 at 08:52
  • Got it. I have known more about the count() method, you really helps:-D – Yixin Apr 23 '18 at 09:00
  • @Yixin The change was first made in [GitLab (SVN?) commit b51b470e](https://gitlab.merchise.org/merchise/cpython/commit/b51b470eb86ded7d3ea26081ca8bc89b4448f962). [The function is in fact called with `end - start` as `str_len`.](https://github.com/python/cpython/blob/09a9f1799c8c58f573c50cb2d526422436b8658b/Objects/unicodeobject.c#L9433) – Solomon Ucko Mar 18 '19 at 11:41