Here is my attempt at a word count for a single column using group by
with pandas :
First setup the data :
columns = ['col1','col2','col3']
data = np.array([['word1','word2','word3'] , ['word1','word5','word3'], ['word3','word7','word3']])
to_count = pd.DataFrame(data,columns=columns)
I'm attempting to count words in col1
in to_count
.
to_count
contains :
col1 col2 col3
0 word1 word2 word3
1 word1 word5 word3
2 word3 word7 word3
To count the words I then use :
print(to_count.groupby('col1').count())
which displays :
col2 col3
col1
word1 2 2
word3 1 1
This seems partly correct in that the word counts are returned but they are spread across multiple columns. How to access word count for a single column ? I could just access a single column in the word count dataframe but this does not seem correct.