Since you are probably looking for combinations of fruit flavors from a set of text that includes non-fruit words, I've made up some documents similar to those in your example. I've used the quanteda package to construct a document-term matrix and then filter based on ngrams containing the fruit words.
docs <- c("One flavor is apple strawberry lime.",
"Another flavor is apple grape lime.",
"Pineapple mango guava is our newest flavor.",
"There is also kiwi guava and grape apple.",
"Mixed berry was introduced last year.",
"Did you like kiwi guava pineapple?",
"Try the lime mixed berry.")
flavorwords <- c("apple", "guava", "berry", "kiwi", "guava", "grape")
require(quanteda)
# form a document-feature matrix ignoring common stopwords + "like"
# for ngrams, bigrams, trigrams
fruitDfm <- dfm(docs, ngrams = 1:3, ignoredFeatures = c("like", "also", stopwords("english")))
## Creating a dfm from a character vector ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 7 documents
## ... indexing features: 90 feature types
## ... removed 47 features, from 176 supplied (glob) feature types
## ... complete.
## ... created a 7 x 40 sparse dfm
## Elapsed time: 0.01 seconds.
# select only those features containing flavorwords as regular expression
fruitDfm <- selectFeatures(fruitDfm, flavorwords, valuetype = "regex")
## kept 22 features, from 5 supplied (regex) feature types
# show the features
topfeatures(fruitDfm, nfeature(fruitDfm))
## apple guava grape pineapple kiwi
## 3 3 2 2 2
## kiwi_guava berry mixed_berry strawberry apple_strawberry
## 2 2 2 1 1
## strawberry_lime apple_strawberry_lime apple_grape grape_lime apple_grape_lime
## 1 1 1 1 1
## pineapple_mango mango_guava pineapple_mango_guava grape_apple guava_pineapple
## 1 1 1 1 1
## kiwi_guava_pineapple lime_mixed_berry
## 1 1
Added:
If you are looking to match the terms not separated by spaces to the document, you can form ngrams with a null string concatenator, and match as below.
flavorwordsConcat <- c("applestrawberrylime", "applegrapelime", "pineapplemangoguava",
"kiwiguava", "grapeapple", "mixedberry", "kiwiguavapineapple",
"limemixedberry")
fruitDfm <- dfm(docs, ngrams = 1:3, concatenator = "")
fruitDfm <- fruitDfm[, features(fruitDfm) %in% flavorwordsConcat]
fruitDfm
# Document-feature matrix of: 7 documents, 8 features.
# 7 x 8 sparse Matrix of class "dfmSparse"
# features
# docs applestrawberrylime applegrapelime pineapplemangoguava kiwiguava grapeapple mixedberry kiwiguavapineapple limemixedberry
# text1 1 0 0 0 0 0 0 0
# text2 0 1 0 0 0 0 0 0
# text3 0 0 1 0 0 0 0 0
# text4 0 0 0 1 1 0 0 0
# text5 0 0 0 0 0 1 0 0
# text6 0 0 0 1 0 0 1 0
# text7 0 0 0 0 0 1 0 1
If your text contains the concatenated flavour words, then you can match the unigram dfm to all trigram permutations of individual fruit words using
unigramFlavorWords <- c("apple", "guava", "grape", "pineapple", "kiwi")
head(unlist(combinat::permn(unigramFlavorWords, paste, collapse = "")))
[1] "appleguavagrapepineapplekiwi" "appleguavagrapekiwipineapple" "appleguavakiwigrapepineapple"
[4] "applekiwiguavagrapepineapple" "kiwiappleguavagrapepineapple" "kiwiappleguavapineapplegrape"