0

I have a data set that looks like the following:

head(data1)
  Data number PatientSID
1           1   24663193
2           3    7451277
3           6    7449440
4           8    7350669
5           9    7328477
6          11    7324432

                Condition                                                                                                                                                                                                                                                                                            
1 acute coronary syndrome
2          abdominal pain
3               epistaxis
4                leg pain
5       chronic back pain
6               back pain

I used the aggregate function to see the frequency of patient Conditions:

x <- aggregate(data.frame(count = data1$Condition), list(value = data1$Condition), length)
head(x,10)
                       value count
1                          3   108
2         4 wheeler accident     1
3                  abdominal     1
4         abdominal aneurysm     1
5  abdominal aortic aneurysm     1
6         abdominal bloating     2
7           abdominal cramps     2
8       abdominal discomfort     6
9       abdominal distension     2
10      abdominal distention    21

Now based on the output above, I want to subset data1 into a dataframe that only contains rows with Condition count >=10. So my subset would contain all rows with conditions "3" and "abdominal distension" for instance. How can I do this?

Sotos
  • 51,121
  • 6
  • 32
  • 66
Diana01
  • 183
  • 1
  • 1
  • 10

1 Answers1

1

You can use dplyr:

x.sub <- x %>%
         filter(count >= 10)

data1.sub <- data1[data1$Condition %in% x.sub$value, ]
akash87
  • 3,876
  • 3
  • 14
  • 30