Hello I am using R studio to filter out varieties of wine that appear less that 5000 times in a dataset.
I have run the below function -
#create new data frame with varities greater than 5000
wineVar <- setDT(wineNew)[, if(.N > 5000) .SD, by = variety]
#list the unique varieties to show theres only 5
unique(wineVar$variety)
However when I try to see how many levels there are I still get the other 632 values.
[1] Cabernet Sauvignon Pinot Noir Chardonnay
[4] Bordeaux-style Red Blend Red Blend
632 Levels: Žilavka Agiorgitiko Aglianico Aidani Airen Albana AlbarÃn ... Zweigelt
Is there a way to completely remove these as it is causing issues with my training set - ie the training set still sees the values but with no data for dropped varieties.