Subset data set based on most frequent values in a column

Question

I have a data set that looks like the following:

head(data1)
  Data number PatientSID
1           1   24663193
2           3    7451277
3           6    7449440
4           8    7350669
5           9    7328477
6          11    7324432

                Condition                                                                                                                                                                                                                                                                                            
1 acute coronary syndrome
2          abdominal pain
3               epistaxis
4                leg pain
5       chronic back pain
6               back pain

I used the aggregate function to see the frequency of patient Conditions:

x <- aggregate(data.frame(count = data1$Condition), list(value = data1$Condition), length)
head(x,10)
                       value count
1                          3   108
2         4 wheeler accident     1
3                  abdominal     1
4         abdominal aneurysm     1
5  abdominal aortic aneurysm     1
6         abdominal bloating     2
7           abdominal cramps     2
8       abdominal discomfort     6
9       abdominal distension     2
10      abdominal distention    21

Now based on the output above, I want to subset data1 into a dataframe that only contains rows with Condition count >=10. So my subset would contain all rows with conditions "3" and "abdominal distension" for instance. How can I do this?

score 1 · Accepted Answer · answered Jun 14 '17 at 14:11

1

You can use dplyr:

x.sub <- x %>%
         filter(count >= 10)

data1.sub <- data1[data1$Condition %in% x.sub$value, ]

answered Jun 14 '17 at 14:11

akash87

3,876
3
14
30

Thank you so much @akash87 ! – Diana01 Jun 14 '17 at 14:19

Subset data set based on most frequent values in a column

1 Answers1