0

I have this data frame:

> str(final)
'data.frame':   112 obs. of  3 variables:
 $ FAO_CountryName: chr  Algeria  Egypt  Libya  Morocco ...
 $ FAO_CountryURL : chr  "http://www.fao.org/giews/countrybrief/country.jsp?code=DZA" "http://www.fao.org/giews/countrybrief/country.jsp?code=EGY" "http://www.fao.org/giews/countrybrief/country.jsp?code=LBY" "http://www.fao.org/giews/countrybrief/country.jsp?code=MAR" ...
 $ Text           : chr  "\r\n   Reference Date: 24-November-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n          "| __truncated__ "\r\n   Reference Date: 28-November-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n          "| __truncated__ "\r\n   Reference Date: 15-November-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n          "| __truncated__ "\r\n   Reference Date: 21-September-2016\r\n   \r\n   \r\n               FOOD SECURITY SNAPSHOT\r\n               \r\n         "| __truncated__ ...

I would like to work on the Text variable in a fashion that I could - for instance - count how many times a word appears in it row by row. In other words, I would like to get a data frame as the following:

> head(final, n=2)
  FAO_CountryName   FAO_CountryURL             Text                    WordCount 
  Algeria            http://www.fao.org…       Algeria is nice…          Algeria  1 
                                                                              is  1
                                                                             ...
  Egypt              http://www.fao.org…       Egypt is nice too…          Egypt    1  
                                                                              is    5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
                                                                              ...

Yet, I have done this:

## Counting the words included in the textual dataset.
   keywords <- text_df %>% 
   unnest_tokens(word, text) %>% 
   count(word, sort = TRUE) %>%
   ungroup()

## Scoring the textual frequencies into the textual dataset (i.e. how many times the words are present)
   total_words <- keywords %>% 
   group_by(word) %>% 
   summarize(total = sum(n))

Nevertheless, this way I only attain the word count of ALL the column, NOT ROW BY ROW. Any suggestion?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Ileeo
  • 25
  • 1
  • 7
  • Including a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) would be nice. – Jaap Feb 20 '17 at 11:57
  • have you checked [rowwise](https://www.rdocumentation.org/packages/dplyr/versions/0.5.0/topics/rowwise) from `dplyr`? – Aramis7d Feb 20 '17 at 12:04
  • Does not work with rowwise... – Ileeo Feb 20 '17 at 13:51

0 Answers0