Hello I'm new at python and I'm writing a module which should take a string as input and the output should be a list of each word, number or symbol with no white spaces. i.e (' 10 sweet apples') --> ['ten', 'sweet', 'apples']. To do so I have a start value which marks the current index number and a end value which increments as long as the next thing in the string is a letter or digit. So far I've successfully added words, numbers, symbols ect to a list which is to be returned at the end of the for loop.
my problem occurs when I'm at the last index number. I have this code :
def tokenize (lines):
tokenizedList = []
for line in lines:
endValue = 0
startValue = 0
while startValue < len(line):
if line[endValue].isalpha():
while line[endValue].isalpha():
endValue = endValue + 1
word = line[startValue : endValue]
tokenizedList.append(word)
startValue = endValue
elif line[endValue].isdigit():
while line[endValue].isdigit():
endValue = endValue + 1
word = line[startValue : endValue]
tokenizedList.append(word)
startValue = endValue
elif line[endValue].isspace():
while line[endValue].isspace():
startValue += 1
endValue = startValue
else:
endValue += 1
word = line[startValue : endValue]
tokenizedList.append(word)
startValue = endValue
return tokenizedList
since the while loops in the if-statements increments endValue, it will eventually be out of range of the index. I can't figure out how to stop this error from occuring and how the while loop should be altered so it knows when to stop checking for the last letter. Any ideas?