I am cleaning data from a .txt source. The file is including WhatsApp messages in every line, including date and time stamp. I already split all of that into one column holding data and time information df['text] and one column holding all the text data df['text_new']. Based on this I want to create a word cloud. This is why I need every word from the several conversations as single entries in seperate pandas data frame entries.
I need your help for further cleaning and transformtation of this data.
Let's suppose the data frame column df['text_new'] is this:
0 How are you?
1 I am fine, we should meet this afternoon!
2 Okay let us do that.
What do I want to do?
- Clean every punctuations out of the text.
- Split the messages in seperate words, so that only one word is in one dataframe entry.
- If it is possible, one smiley should be considered as a single word. If this it not possible, how to clean them out?
- Make every text lower case. There is already a solution for that, but it would be really nice to include it into the "cleaning code".
Now that you know the three steps I want to run, maybe someone has a clean and neat way to do that.
Thank you all in advance!