Count the number of words without white spaces

Question

I have the following string:

str1<-" india hit milestone electricity wind solar"

Number of words contained in it is:

>sapply(strsplit(str1, " "), length)
[1] 7

It is not true because we have a space at the beginning of str1. I tried to trim the white space but:

> stripWhitespace(str1) # by tm package

returns the same situation:

[1] " india hit milestone electricity wind solar"

Why?

Where is `stripWhitespace` defined? My first thought was for `trimws` to remove the leading (and trailing, if present) spaces. That would make `sapply(strsplit(trimws(str1), " "), length)` "6". — r2evans, Apr 15 '20 at 15:52
stripWhitespace is for use in documents in a corpus set up using the tm package, not for strings. If you just have a string, use trimws as mentioned above. — Lynn L, Apr 15 '20 at 15:55

score 4 · Accepted Answer · answered Apr 15 '20 at 15:52

4

You can just use the base function trimws

sapply(strsplit(trimws(str1), " "), length)
[1] 6

answered Apr 15 '20 at 15:52

phiver

score 1 · Answer 2 · answered Apr 15 '20 at 15:52

1

Maybe you can try

lengths(gregexpr("\\b\\w+\\b",str1))

such that

> lengths(gregexpr("\\b\\w+\\b",str1))
[1] 6

answered Apr 15 '20 at 15:52

ThomasIsCoding

score 0 · Answer 3 · answered Apr 15 '20 at 15:55

0

You could try using stringr::str_trim and stringr::str_split like this:

length(stringr::str_split(stringr::str_trim(str1), pattern=" ", simplify=T))

answered Apr 15 '20 at 15:55

user438383

score 0 · Answer 4 · answered Apr 15 '20 at 17:24

0

We can use str_count

library(stringr)
str_count(str1, '\\w+')
#[1] 6

answered Apr 15 '20 at 17:24

akrun

4 Answers4