I have a dataset with 600 responses with a "Free_Text" variable which contains the feedback/comments from the respondents. Now I want to calculate the number of words in the comments for each respondent. How should I do it? I am a new learner of R and am working on R studio.
Asked
Active
Viewed 137 times
-6
-
4Please do not ask for help without [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Jun 24 '14 at 11:30
4 Answers
2
Consider using stri_extract_words
from the stringi
package, especially if you have a non-English text. It uses ICU's BreakIterator for this task and contains a list of sophisticated word breaking rules.
library(stringi)
str <- c("How many words are there?", "R — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта GNU.")
stri_extract_words(str)
## [[1]]
## [1] "How" "many" "words" "are" "there"
##
## [[2]]
## [1] "R" "язык" "программирования" "для" "статистической"
## [6] "обработки" "данных" "и" "работы" "с"
## [11] "графикой" "а" "также" "свободная" "программная"
## [16] "среда" "вычислений" "с" "открытым" "исходным"
## [21] "кодом" "в" "рамках" "проекта" "GNU"
sapply(stri_extract_words(str), length) # how many words are there in each character string?
## [1] 5 25

gagolews
- 12,836
- 2
- 50
- 75
1
Split the string and count the elements is a simple way to get you started.
str = "This is a string."
str_length = length(strsplit(str," ")[[1]])
> str_length
[1] 4

AGS
- 14,288
- 5
- 52
- 67
1
May be this helps:
str1 <- c("How many words are in this sentence","How many words")
sapply(gregexpr("\\W+", gsub("[[:punct:]]+","",str1)), length) + 1
#[1] 7 3
Also,
library(qdap)
word_count(str1)
#[1] 7 3
str2 <- "How many words?."
word_count(str2)
#[1] 3

akrun
- 874,273
- 37
- 540
- 662
0
And, one more method, using the stringr package, to list individual words:
str1 <- c("How many words are in this sentence","How many words")
length(unlist(str_match_all(str1, "\\S+" ))) # list all words -- strings that end with one or more white spaces, then unlist them so that the length function counts them

lawyeR
- 7,488
- 5
- 33
- 63