I have a set of data containing several variables. One of the variables - factorial contains the designation of groups - A, B, C, etc. The remaining variables are numeric.
> data1
Group Value
1 A 23
2 A 25
3 B 1
4 C 15
5 C 11
6 C 14
7 B 3
8 B 4
9 B 2
10 C 19
For further statistical calculations I want to exclude from the data set the lines that contain a particular group (e.g., X) with the proviso that the group is found in the dataframe n-number of times (e.g., less than 2 times).
The materials that I've seen before mainly concern delete rows with specific values and are not associated with the frequency of occurrence of group (factor) in the dataframe. Maybe I'm wrong? Sorry!
To remove specific rows in the "manual" mode, I use the following code:
data1 <- as.data.frame(
lapply(subset(data1, !Group=="A"),
function(x) if(is.factor(x)) factor(x) else x
)
)
I would like to automate this process, and to exclude all levels factor (groups) with predetermined occurrence:
> data1
Group Value
1 B 1
2 C 15
3 C 11
4 C 14
5 B 3
6 B 4
7 B 2
8 C 19
Addition
Mr. 'Akrun' brought the idea to use the following code:
tbl <- table(data1$Group)
data1 <- subset(data1, Group %in% names(tbl)[tbl>2])
This is what you need! I thank him for that! However, rezltate factor levels remain unchanged. To correct this, I am forced to use the record:
data1$Group = factor(data1$Group)
Surely there are ready-made solutions taking into account the case?