1

Having data frame like this one:

data.frame(id = c(1,2,3), text = c("my text here", "another the here but different", "no text"))

How is it possible to cound for every row the number of words which has and cut the rows which have equal or less than 2 words?

Expected output

data.frame(id = c(1,2), text = c("my text here", "another the here but different"))
Nathalie
  • 1,228
  • 7
  • 20

3 Answers3

3

One option utilizing the stringr library could be:

df[!is.na(word(df$text, 3)), ]

  id                           text
1  1                   my text here
2  2 another the here but different

Or another option using the stringr library (provided by @Sotos):

df[str_count(df$text, fixed(" ")) >= 2, ]
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • 2
    I love the `word()` function, but I think If you are going to load `stringr`, then `str_count()` might be more straight forward, i.e. `df[str_count(df$text, ' ') >= 2,]` – Sotos Jan 22 '20 at 12:56
1

Here is a base R solution using gregexpr() + lengths() + subset():

dfout <- subset(df,lengths(gregexpr("[[:alpha:]]+",df$text))>2)

such that

> dfout
  id                           text
1  1                   my text here
2  2 another the here but different

DATA

df <- structure(list(id = c(1, 2, 3), text = structure(c(2L, 1L, 3L
), .Label = c("another the here but different", "my text here", 
"no text"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

You can use strsplit and lengths to find where you have more than 2 words.

df[lengths(strsplit(as.character(df$text), "\\b ")) > 2,]
#  id                           text
#1  1                   my text here
#2  2 another the here but different

df[lengths(strsplit(as.character(df$text), "\\W+")) > 2,] #Alternative

or using gregexpr:

df[lengths(gregexpr("\\W+", df$text)) > 1,]
  id                           text
1  1                   my text here
2  2 another the here but different

Have a look at Count the number of all words in a string.

GKi
  • 37,245
  • 2
  • 26
  • 48