Explanation about split in python

Question

I have this task.

st = 'print only the words that sstart with an s in the sstatement'

and the solution would be

for word in st.split():
    if word[0] == 's':
        print word

why won't it work with

for word in st.split():
    if word[1] == 's':
        print word

I kind of understand what that zero stands for, but how can I print the words with the second letter being 's'.

Your question is to print words that start with s. But now you want to print words where the second letter starts with s? — idjaw, Jun 24 '17 at 13:14
Also, what you're doing can't work if `word` is only one character long. — Thomas Kowalski, Jun 24 '17 at 13:14
It's working correctly, python is raising IndexError when the length of the `word` is only one.! — zaidfazil, Jun 24 '17 at 13:16
What about words that don't have a second letter (or even a first letter)? Also what exactly isn't working? Please provide a [mcve] including the expected output. :) — MSeifert, Jun 24 '17 at 13:19

Willem Van Onsem · Accepted Answer · 2017-06-24T13:32:54.467

2

One of the problems is that it is not guaranteed that the length of the string is sufficient. For instance the empty string ('') or a string with one character ('s') might end up in the word list as well.

A quick fix is to use a length check:

for word in st.split():
    if len(word) > 1 and word[1] == 's':
        print word

Or you can - like @idjaw says - use slicing, and then we will obtain an empty string if out of range:

for word in st.split():
    if word[1:2] == 's':
        print word

If you have a string, you can obtain a substring with st[i:j] with st the string, i the first index (inclusive) and j the last index (exclusive). If however the indices are out of range, that is not a problem: then you will obtain the empty string. So we simply construct a slice that starts at 1 and ends at 1 (both inclusive here). If no such indices exist, we obtain the empty string (and this is not equal to 's'), otherwise we obtain a string with exactly one character: the one at index 1.

In the case however you will check against more complicated patterns, you can use a regex:

import re

rgx = re.compile(r'\b\ws\w*\b')
rgx.findall('print only the words that sstart with an s in the sstatement')

Here we specified to match anything between word boundaries \b that is a sequence of \ws with the second character an s:

>>> rgx.findall('print only the words that sstart with an s in the sstatement')
['sstart', 'sstatement']

edited Jun 24 '17 at 13:32

answered Jun 24 '17 at 13:17

Willem Van Onsem

443,496
30
428
555

1

I was trying to be really anal and see if I can solve it using slicing and omit the length (just for fun), but that darn single character still requires the length check. – idjaw Jun 24 '17 at 13:24
Thank you, regex is a bit complicated for now but i will surely note that down. – Stelian Jun 24 '17 at 13:25
1

@idjaw You could use `word[1:2] == 's'`, that shouldn't require a length check :) – MSeifert Jun 24 '17 at 13:26
1

@MSeifert oh!!! gah! I was using `word[:2:2]`! Thanks :) Of course, now that I think about it, it is so obvious lol..... – idjaw Jun 24 '17 at 13:26
thank's for the solution but coulld you explain how that works? word[1:2] == 's' i really want to understand pythond and coding in general. quite new to this) – Stelian Jun 24 '17 at 13:30
@Stelian [this](https://stackoverflow.com/a/509295/1832539) answer explains slicing *perfectly*. – idjaw Jun 24 '17 at 13:32

Explanation about split in python

1 Answers1