-6

I have a dataset with 600 responses with a "Free_Text" variable which contains the feedback/comments from the respondents. Now I want to calculate the number of words in the comments for each respondent. How should I do it? I am a new learner of R and am working on R studio.

lawyeR
  • 7,488
  • 5
  • 33
  • 63
  • 4
    Please do not ask for help without [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Jun 24 '14 at 11:30

4 Answers4

2

Consider using stri_extract_words from the stringi package, especially if you have a non-English text. It uses ICU's BreakIterator for this task and contains a list of sophisticated word breaking rules.

library(stringi)
str <- c("How many words are there?", "R — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта GNU.")
stri_extract_words(str)
## [[1]]
## [1] "How"   "many"  "words" "are"   "there"
## 
## [[2]]
##  [1] "R"                "язык"             "программирования" "для"              "статистической"  
##  [6] "обработки"        "данных"           "и"                "работы"           "с"               
## [11] "графикой"         "а"                "также"            "свободная"        "программная"     
## [16] "среда"            "вычислений"       "с"                "открытым"         "исходным"        
## [21] "кодом"            "в"                "рамках"           "проекта"          "GNU"   
sapply(stri_extract_words(str), length) # how many words are there in each character string?
## [1]  5 25
gagolews
  • 12,836
  • 2
  • 50
  • 75
1

Split the string and count the elements is a simple way to get you started.

str = "This is a string."

str_length = length(strsplit(str," ")[[1]])

> str_length
[1] 4
AGS
  • 14,288
  • 5
  • 52
  • 67
1

May be this helps:

 str1 <- c("How many words are in this sentence","How many words")
 sapply(gregexpr("\\W+", gsub("[[:punct:]]+","",str1)), length) + 1
 #[1] 7 3

Also,

 library(qdap)
 word_count(str1)
#[1] 7 3

 str2 <- "How many words?."  
 word_count(str2)
 #[1] 3
akrun
  • 874,273
  • 37
  • 540
  • 662
0

And, one more method, using the stringr package, to list individual words:

str1 <- c("How many words are in this sentence","How many words")
length(unlist(str_match_all(str1, "\\S+" ))) # list all words -- strings that end with one or more white spaces, then unlist them so that the length function counts them
lawyeR
  • 7,488
  • 5
  • 33
  • 63