I have this dataframe:
structure(list(CATEGORY = c("Edible, Vape", "Concentrate, Flower",
"Concentrate, Flower", "Concentrate, Flower", "Edible", "Concentrate, Flower",
"Edible, Vape", "Edible", "Concentrate, Flower", "Concentrate, Flower",
"Edible", "Edible", "Edible", "Concentrate, Flower", "Edible",
"Edible", "Edible", "Edible, Vape", "Edible", "Edible", "Concentrate, Flower",
"Edible", "Concentrate, Flower", "Concentrate, Flower", "Concentrate, Flower",
"Edible", "Concentrate, Flower", "Concentrate, Edible, Flower",
"Concentrate, Flower", "Edible", "Concentrate, Edible, Flower",
"Edible", "Concentrate, Edible, Flower", "Concentrate, Edible, Flower, Vape",
"Concentrate, Edible, Flower", "Concentrate, Flower", "Edible",
"Edible", "Edible", "Concentrate, Edible, Flower, Vape", "Concentrate, Flower",
"Concentrate, Flower", "Edible", "Concentrate, Flower", "Concentrate, Flower",
"Concentrate, Flower", "Concentrate, Flower", "Concentrate, Flower",
"Concentrate, Flower", "Edible, Vape", "Concentrate, Flower",
"Edible, Vape", "Concentrate, Edible, Flower", "Edible, Vape",
"Concentrate, Flower", "Edible", "Concentrate, Flower", "Concentrate, Flower",
"Edible", "Concentrate, Flower", "Edible, Vape", "Edible", "Concentrate, Edible, Flower, Vape",
"Edible", "Edible", "Concentrate, Flower", "Concentrate, Flower",
"Edible, Vape", "Concentrate, Flower", "Edible", "Edible", "Edible, Vape",
"Edible", "Edible", "Edible", "Concentrate, Flower", "Edible",
"Edible", "Concentrate, Flower", "Edible, Vape", "Concentrate, Flower",
"Edible", "Edible", "Edible", "Edible", "Concentrate, Flower",
"Edible, Vape", "Edible", "Concentrate, Flower", "Edible, Vape",
"Concentrate, Flower", "Concentrate, Flower", "Concentrate, Flower",
"Concentrate, Flower", "Edible", "Edible", "Edible", "Edible, Vape",
"Concentrate, Flower", "Edible")), row.names = c(NA, -100L), class = c("tbl_df",
"tbl", "data.frame"))
Some of the items in the CATEGORY
vector have only one string and some of them have two, three or four. (And larger, this is just a section of a bigger data frame.)
How can I filter to only include items with two or three items in the dataset?
For example, if I type this:
unique(interesting_baskets_df$CATEGORY)
I see these categories.
[1] "Edible, Vape" "Concentrate, Flower" "Edible" "Concentrate, Edible, Flower"
[5] "Concentrate, Edible, Flower, Vape"
But I only want to include "Edible, Vape" or "Concentrate, Flower" or "Edible".
I know in this case I could input a specific filter
in dplyr
with a set of items, but my dataset is much larger and I would need a more flexible solution. I would appreciate something that would be flexible in choosing the number of items, two or three or four, since I don't exactly know what will be most useful in association rule learning.