42

I wanted to know how to iterate through a string word by word.

string = "this is a string"
for word in string:
    print (word)

The above gives an output:

t
h
i
s

i
s

a

s
t
r
i
n
g

But I am looking for the following output:

this
is
a
string
Pavel
  • 5,374
  • 4
  • 30
  • 55
m0bi5
  • 8,900
  • 7
  • 33
  • 44
  • Pretty closely related previous question (though not an exact duplicate) is http://stackoverflow.com/questions/6181763/converting-a-string-to-a-list-of-words – paisanco Aug 06 '15 at 01:48

7 Answers7

87

When you do -

for word in string:

You are not iterating through the words in the string, you are iterating through the characters in the string. To iterate through the words, you would first need to split the string into words , using str.split() , and then iterate through that . Example -

my_string = "this is a string"
for word in my_string.split():
    print (word)

Please note, str.split() , without passing any arguments splits by all whitespaces (space, multiple spaces, tab, newlines, etc).

Olivier Pons
  • 15,363
  • 26
  • 117
  • 213
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • hey , is there a way to maintain all the spaces and do the same? – m0bi5 Aug 06 '15 at 01:57
  • 2
    @MohitBhasi Maybe you got it wrongly? `str.split()` is not insplace , it just returns the list of spllited strings, the original string is still intact. – Anand S Kumar Aug 06 '15 at 02:13
10

This is one way to do it:

string = "this is a string"
ssplit = string.split()
for word in ssplit:
    print (word)

Output:

this
is
a
string
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
2
for word in string.split():
    print word
Connor
  • 134
  • 6
  • 3
    You should explain the `split` method; don't expect everyone to know what it does or why you used it. – jpaugh Aug 06 '15 at 03:34
  • 5
    My comment was terse, but not meant as an insult. I just want [so] to be the best it can be. Code-only answers are hard to read and understand, especially to those who "don't know" what seems obvious to you. That's why they are looking here for an answer. – jpaugh Aug 06 '15 at 18:11
2

Using nltk.

from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize("This is a string.")
words_in_each_sentence = word_tokenize(sentences)

You may use TweetTokenizer for parsing casual text with emoticons and such.

noɥʇʎԀʎzɐɹƆ
  • 9,967
  • 2
  • 50
  • 67
0

One way to do this is using a dictionary. The problem for the code above is it counts each letter in a string, instead of each word. To solve this problem, you should first turn the string into a list by using the split() method, and then create a variable counts each comma in the list as its own value. The code below returns each time a word appears in a string in the form of a dictionary.

    s = input('Enter a string to see if strings are repeated: ')
    d = dict()
    p = s.split()
    word = ','
    for word in p:
        if word not in d:
            d[word] = 1
        else:
            d[word] += 1
    print (d)
-2
s = 'hi how are you'
l = list(map(lambda x: x,s.split()))
print(l)

Output: ['hi', 'how', 'are', 'you']

Nanda Thota
  • 322
  • 3
  • 10
-2

You can try this method also:

sentence_1 = "This is a string"

list = sentence_1.split()

for i in list:

print (i)

  • 1
    This is the same solution as in [this other answer](https://stackoverflow.com/a/31845501/2227743). – Eric Aya Aug 09 '22 at 14:15
  • 1
    This is the same solution as other answers and contains no explanation of how or why it works. Additionally, the use of "list" as a variable name is a dangerous practice because it shadows the builtin name and can cause unexpected results. – drowningincode Aug 12 '22 at 19:56