I have a pandas dataframe df of the form:
df = pd.DataFrame.from_dict({'ID':[1,2,3], \
'Strings':['Hello, how are you?', 'Nice to meet you!', 'My name is John.']})
I want to tokenize the Strings column and create a new data frame new_df:
Sentence Word
0 Hello
0 ,
0 how
0 are
0 you
0 ?
1 Nice
1 to
1 meet
1 you
1 .
2 My
2 name
2 is
2 John
2 .
I know for tokenization I can possibly use nltk.word_tokenize() for evert string in df, but how do I get from that point to new_df in a manner that is efficient?