I have a function in python that split a sentence into words using tokenizer. The Problem is that when i run this function the output returned is one word with no spaces.
- actual sentence:
'is lovin Picture2Life.com!!! Y all fun apps r for iphone and not blackberry??!! '
- result:
'islovinpicturelifecomyallfunappsrforiphoneandnotblackberry'
where the result must be like this: is loving picture 2 life . com....
code:
ppt = '''...!@#$%^&*()....{}’‘ “” “[]|._-`/?:;"'\,~12345678876543'''
#tekonize helper function
def text_process(raw_text):
'''
parameters:
=========
raw_text: text as input
functions:
==========
- remove all punctuation
- remove all stop words
- return a list of the cleaned text
'''
#check characters to see if they are in punctuation
nopunc = [char for char in list(raw_text) if char not in ppt]
# join the characters again to form the string
nopunc = "".join(nopunc)
#now just remove ant stopwords
words = [word for word in nopunc.lower().split() if word.lower() not in stopwords.words("english")]
return words
ddt= data.text[2:3].apply(text_process)
print("example: {}".format(ddt))