how can I extract a subset based on values of a column with more than one frequency?

Question

I have a data frame (my_data) same as below:

No.       Id         X     Y         Z        Cat
1      A2_A0CX      A2  2010-06-16  A00Z      cat1
2      A8_A076      A8  2010-06-16  A00Z      cat2
3      A8_A07B      A8  2010-06-16  A00Z      cat2
4      A8_A07I      A8  2010-06-16  A00Z      cat2
5      A8_A081      A8  2010-06-16  A00Z      cat2
6      AO_A03L      AO  2010-08-11  A056      cat3
7      AO_A0JE      AO  2010-08-11  A056      cat3
8      A2_A0CX      A2  2010-07-14  A034      cat4
.        .           .      .        .          .
.        .           .      .        .          .
.        .           .      .        .          .

I need to prepare a subset in that I had just rows with "Cat" values of more than one frequency. As a result, I need to have a subset as below:

No.       Id         X     Y         Z        Cat
2      A8_A076      A8  2010-06-16  A00Z      cat2
3      A8_A07B      A8  2010-06-16  A00Z      cat2
4      A8_A07I      A8  2010-06-16  A00Z      cat2
5      A8_A081      A8  2010-06-16  A00Z      cat2
6      AO_A03L      AO  2010-08-11  A056      cat3
7      AO_A0JE      AO  2010-08-11  A056      cat3
.        .           .      .        .          .
.        .           .      .        .          .
.        .           .      .        .          .

How can I prepare that subset from my_data? And after getting the subset, I want to label the remaining Cat values based on the number from 1 same as below:

No.       Id         X     Y         Z        Cat
2      A8_A076      A8  2010-06-16  A00Z       1
3      A8_A07B      A8  2010-06-16  A00Z       1
4      A8_A07I      A8  2010-06-16  A00Z       1
5      A8_A081      A8  2010-06-16  A00Z       1
6      AO_A03L      AO  2010-08-11  A056       2
7      AO_A0JE      AO  2010-08-11  A056       2
.        .           .      .        .          .
.        .           .      .        .          .
.        .           .      .        .          .

In `dplyr`, you can do : `my_data %>% group_by(Cat) %>% filter(n() > 1)` — Ronak Shah, Dec 07 '20 at 08:13
You could simply create a vector where you count the occurences of your Categories, then filter thsi vector to thos categories > 1, and then use this vector to index your data frame, like `data[data$CAT %in% my_vector,]` — deschen, Dec 07 '20 at 08:13
Dear @ronak-shah, could you please guid me for second section of my question? — Mohammad, Dec 07 '20 at 09:12
You should not edit your question to include followup question. Each post should have only one question. You should have asked a new question instead. Anyway your answer is `mydata$Cat <- match(mydata$Cat, unique(mydata$Cat))` after you have done the above filter. — Ronak Shah, Dec 07 '20 at 09:20

how can I extract a subset based on values of a column with more than one frequency?

0 Answers0