3

I am trying to extract all strings from rows in a dataframe that match certain criteria for example how many words are match 'corn' in each row. Here is the input.

install.packages('stringr')
library(stringr)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
y<- c('corn',"corn","mean","meal")
id<- c(1,2,3,4)
dataset <- data.frame(id,dataset,y)

id         dataset    y
1  1            corn corn
2  2        cornmeal corn
3  3 corn on the cob mean
4  4            meal meal

I am trying to get output like this

 id         dataset    y    corn  meal 
  1  1            corn corn  2     0 
  2  2        cornmeal corn  1     0
  3  3 corn on the cob mean  0     0
  4  4            meal meal  0     2
akrun
  • 874,273
  • 37
  • 540
  • 662
user3570187
  • 1,743
  • 3
  • 17
  • 34
  • 3
    Its a simple `rowSums` operation. You want a column per each word in `dataset` or `y`? – David Arenburg Jun 06 '15 at 19:23
  • I want only column for each word as shown above. but i have a large set of variables like V1 : V100 and i need to create columns like corn, meal etc – user3570187 Jun 06 '15 at 19:30
  • I got this error. Error in rowSums(dataset, na.rm = FALSE, dims = 1) : 'x' must be numeric – user3570187 Jun 06 '15 at 19:31
  • 3
    As @DavidArenburg commented, this can be done with `rowSums` i.e. `dataset[c('corn', 'mean')] <- sapply(c('corn', 'meal'), function(x) rowSums(dataset[-1]==x))`. You can create a vector of names i.e. `v1 <- c('corn', 'meal',...)` and then loop over using `sapply` – akrun Jun 06 '15 at 19:31

1 Answers1

4

An option using rowSums. We create a vector of names to compare and then create the columns based on that names.

v1 <- c('corn', 'meal')     
dataset[v1] <- sapply(v1, function(x)  rowSums(dataset[-1]==x))
akrun
  • 874,273
  • 37
  • 540
  • 662