-1

I am trying to do simple 'NLP' in python using functions.

For some reason whenever I run my code, the first string works fine, however I get an error message ('list index out of range') whenever I run the second string.

def sentence_to_words(s):
    s=s.lower()
    s=s.split(" ")
    lst=["$", "#", "%", "!", "?", ".", ","]
    for i in range(len(s)):
        s[i]=list(s[i])
        while s[i][0] in lst:
            del s[i][0]
        while s[i][-1]in lst:
            del s[i][-1]
        s[i]=''.join(s[i])
    return (s)

print sentence_to_words("Will this work?")
print sentence_to_words("Mr. Stark ... I don't feel so good")

the end result for both should be:

['will' , 'this' , 'work']
["mr" , "stark" , "i" , "don't" , "feel" , "so" , "good"] 

But the second one doesn't actually run, and I get an error message instead.

zero323
  • 322,348
  • 103
  • 959
  • 935
Heba Masarwa
  • 57
  • 1
  • 7

3 Answers3

1

There is a error in your both while loops that is not clear in the first look. You are deleting index: 0 of s[i] in first while for ... In your second input. the s[i] list is in this case: ['.', '.', '.'].

So your first while will call 3 times and make the s[i] list empty. Then you are trying to access index -1 of it in the second while condition. Now guess what? there is no -1 index anymore, because the list is empty. Only thing you should do is to add another condition to second while to prevent doing that in empty arrays. Here is your code:

def sentence_to_words(s):
s=s.lower()
s=s.split(" ")
lst=["$", "#", "%", "!", "?", ".", ","]
for i in range(len(s)):
    s[i]=list(s[i])
    while s[i][0] in lst:
        del s[i][0]
        if not s[i]:
            break
    while s[i] and s[i][-1]in lst:
        del s[i][-1]
    s[i]=''.join(s[i])
return (s)

print sentence_to_words("Will this work?") print sentence_to_words("Mr. Stark ... I don't feel so good") There are 2 changes. Each time at the end of first while we are checking if the list has been made empty or not. if yes, we will break the loop and will avoid first error.

Second change is in the beginning of the second while. here we first check s[i] is not empty. Python will convert an empty list to False when it will be in a loop condition. So now we will not get any error anymore.

You can remove if statement at the end of the first loop and do the same thing as second one, I just putted it there to showing you that you can solve this problem in different ways.

Mr Alihoseiny
  • 1,202
  • 14
  • 24
0

With the second example, the "..." string is giving you issues.

All of the characters in that string are in the list of characters to remove.

This means that in the first while, all the characters do get removed, but then when you try and access its "first" character, you hit an error since it is empty!

A quick fix is to just add an extra condition: s[i] which means that when the string is empty, the while loop will continue on.

Finally you have to handle what you do with this end string because you can't just leave it in the output.

Ideally you would delete it from the s list, but since you are iterating over the s list, this will not work.

Instead it makes more sense to generate a new output list which you send the "parsed" words to.

Here's that in code:

def sentence_to_words(s):
    s = s.lower()
    s = s.split(" ")
    lst = ["$", "#", "%", "!", "?", ".", ","]
    output = []
    for i in range(len(s)):
        s[i] = list(s[i])
        while s[i] and s[i][0] in lst:
            del s[i][0]
        while s[i] and s[i][-1] in lst:
            del s[i][-1]
        if s[i]:
            output.append(''.join(s[i]))
    return output

print sentence_to_words("Will this work?")
print sentence_to_words("Mr. Stark ... I don't feel so good")

Now it runs as expected:

>>> sentence_to_words("Will this work?")
['will', 'this', 'work']
>>> sentence_to_words("Mr. Stark ... I don't feel so good")
['mr', 'stark', 'i', "don't", 'feel', 'so', 'good']
Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
-2

Try this

def sentence_to_words(s):
 mylist = []
 s = s.lower()
 s = s.split(' ')
 for i in s:
         mylist.append(''.join(ch for ch in i if ch.isalnum()))
 return list(filter(None, mylist))
blue
  • 10
  • 1
  • 1
  • it worked! but sadly i don't really understand what you did! – Heba Masarwa Nov 17 '18 at 16:44
  • First I splitted the list, just like you and made it all lowercase. Then for each item in the list I removed any special characters if it had any, with that generator object. Then I just returned the list but I removed empty strings from it with the list(filter()) Thing. :) – blue Nov 17 '18 at 16:46
  • oh thank you! i am kinda new to code so i was a little confused. – Heba Masarwa Nov 17 '18 at 16:54