I have this data frame:
> str(final)
'data.frame': 112 obs. of 3 variables:
$ FAO_CountryName: chr Algeria Egypt Libya Morocco ...
$ FAO_CountryURL : chr "http://www.fao.org/giews/countrybrief/country.jsp?code=DZA" "http://www.fao.org/giews/countrybrief/country.jsp?code=EGY" "http://www.fao.org/giews/countrybrief/country.jsp?code=LBY" "http://www.fao.org/giews/countrybrief/country.jsp?code=MAR" ...
$ Text : chr "\r\n Reference Date: 24-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 28-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 15-November-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ "\r\n Reference Date: 21-September-2016\r\n \r\n \r\n FOOD SECURITY SNAPSHOT\r\n \r\n "| __truncated__ ...
I would like to work on the Text variable in a fashion that I could - for instance - count how many times a word appears in it row by row. In other words, I would like to get a data frame as the following:
> head(final, n=2)
FAO_CountryName FAO_CountryURL Text WordCount
Algeria http://www.fao.org… Algeria is nice… Algeria 1
is 1
...
Egypt http://www.fao.org… Egypt is nice too… Egypt 1
is 5
...
Yet, I have done this:
## Counting the words included in the textual dataset.
keywords <- text_df %>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE) %>%
ungroup()
## Scoring the textual frequencies into the textual dataset (i.e. how many times the words are present)
total_words <- keywords %>%
group_by(word) %>%
summarize(total = sum(n))
Nevertheless, this way I only attain the word count of ALL the column, NOT ROW BY ROW. Any suggestion?