Splitting words in a column

Question

I have a csv with msg column and it has the following text

muchloveandhugs                                  
dudeseriously                                    
onemorepersonforthewin                           
havefreebiewoohoothankgod                        
thisismybestcategory                             
yupbabe                                          
didfreebee                                       
heykidforget                                     
hecomplainsaboutit

I know that nltk.corpus.words has a bunch of sensible words. My problem is how do I iterate it over the df[‘msg’] column so that I can get words such as

df[‘msg’]
much love and hugs
dude seriously
one more person for the win

The problem is broad and not well defined. For example, is `someone` one word or `some one`? You should share your existing code so there's somewhere to start with. — jpp, Oct 15 '18 at 14:44
This is a complicated problem and prone to error since it relies heavily on probability. I found [this link](http://nbviewer.jupyter.org/url/norvig.com/ipython/How%20to%20Do%20Things%20with%20Words.ipynb#(5)-Task:-Word-Segmentation) that suggests an approach. Personally, I'd be tempted to just ask Google; it will split up such strings and offer a "do you mean" link. — kindall, Oct 15 '18 at 14:48

score 2 · Accepted Answer · answered Oct 15 '18 at 15:06

From this question about splitting words in strings with no spaces and not quite knowing what your data looks like:

import pandas as pd
import wordninja

filename = 'mycsv.csv' # Put your filename here

df = pd.read_csv(filename)
for wordstring in df['msg']:
    split = wordninja.split(wordstring)
    # Do something with split

Splitting words in a column

1 Answers1