1

I have the following string:

str1<-" india hit milestone electricity wind solar"

Number of words contained in it is:

>sapply(strsplit(str1, " "), length)
[1] 7

It is not true because we have a space at the beginning of str1. I tried to trim the white space but:

> stripWhitespace(str1) # by tm package

returns the same situation:

[1] " india hit milestone electricity wind solar"

Why?

Mark
  • 1,577
  • 16
  • 43
  • 1
    Where is `stripWhitespace` defined? My first thought was for `trimws` to remove the leading (and trailing, if present) spaces. That would make `sapply(strsplit(trimws(str1), " "), length)` "6". – r2evans Apr 15 '20 at 15:52
  • stripWhitespace is for use in documents in a corpus set up using the tm package, not for strings. If you just have a string, use trimws as mentioned above. – Lynn L Apr 15 '20 at 15:55

4 Answers4

4

You can just use the base function trimws

sapply(strsplit(trimws(str1), " "), length)
[1] 6
phiver
  • 23,048
  • 14
  • 44
  • 56
1

Maybe you can try

lengths(gregexpr("\\b\\w+\\b",str1))

such that

> lengths(gregexpr("\\b\\w+\\b",str1))
[1] 6
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

You could try using stringr::str_trim and stringr::str_split like this:

length(stringr::str_split(stringr::str_trim(str1), pattern=" ", simplify=T))
user438383
  • 5,716
  • 8
  • 28
  • 43
0

We can use str_count

library(stringr)
str_count(str1, '\\w+')
#[1] 6
akrun
  • 874,273
  • 37
  • 540
  • 662