I have this dataset:
df <- structure(list(Species = c("Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Australopithecus afarensis", "Australopithecus afarensis",
"Australopithecus afarensis", "Australopithecus afarensis", "Australopithecus afarensis",
"Paranthropus boisei", "Australopithecus afarensis", "Paranthropus boisei",
"Australopithecus africanus", "Australopithecus africanus", "Australopithecus africanus",
"Paranthropus robustus", "Australopithecus africanus", "Australopithecus africanus",
"Paranthropus robustus", "Australopithecus africanus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Australopithecus africanus",
"Australopithecus afarensis", "Australopithecus afarensis", "Australopithecus afarensis",
"Australopithecus afarensis", "Australopithecus afarensis", "Australopithecus afarensis",
"Australopithecus afarensis", "Australopithecus afarensis", "Australopithecus afarensis",
"Australopithecus afarensis", "Paranthropus boisei", "Paranthropus boisei",
"Australopithecus africanus", "Australopithecus africanus", "Paranthropus boisei",
"Paranthropus boisei", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Paranthropus robustus", "Paranthropus robustus", "Paranthropus robustus",
"Australopithecus africanus", "Paranthropus robustus", "Ardipithecus ramidus",
"Ardipithecus ramidus", "Ardipithecus ramidus", "Homo habilis",
"Homo habilis", "Paranthropus robustus"), `Site / Population` = c("Drimolen",
"Drimolen", "Drimolen", "Nefuraytu: Woranso-Mille (Central Afar, Ethiopia)",
"Laetolil", "Laetolil", "Laetolil", "Laetolil", "Lake Turkana",
"Laetolil", NA, "Makapansgat", "Makapansgat", "Makapansgat",
"Kroomdrai", "Taung", "Taung", "Kroomdrai", "Makapansgat", "Swartkrans",
"Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans",
"Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans",
"Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans",
"Swartkrans", "Swartkrans", "Sterkfontein", "Hadar", "Hadar",
"Hadar", "Hadar", "Hadar", "Hadar", "Hadar", "Hadar", "Hadar",
"Hadar", "Koobi Fora", "East Turkana", "Makapansgat", "Makapansgat",
"Peninj", "Peninj", "Swartkrans", "Swartkrans", "Swartkrans",
"Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans", "Swartkrans",
"Sterkfontein", "Kroomdrai", "Aramis, Middle Awash", "Aramis, Middle Awash",
"Aramis, Middle Awash", "Sterkfontein", "Sterkfontein", "Sterkfontein"
), Specimen = c("DNH 7", "DNH 8", "DNH 8", "NFR-VP-1/29", "LH-2",
"LH-3", "LH-4", "LH-4", "KNM-WT 16005", "LH-16", "KNM-ER 15930",
"MLD 2", "MLD 2", "Rev. Paper", "Rev. Paper", "Rev. Paper", "Rev. Paper",
"Rev. Paper", "Rev. Paper", "SK 104", "SK 23", "SK 23", "SK 25",
"SK 25", "SK 34", "SK 55b", "SK 55b", "SK 6", "SK 6", "SK 61",
"SK 63", "SK 63", "SK 828", "SK 838", "SK 843", "SK 845", "SK 846",
"Sts 52b", "AL 128-23", "AL 145-35", "AL 266-1", "AL 288-1i",
"AL 333-74", "AL 333w-1", "AL 333w-1", "AL 333w-32,60", "AL 400-1a",
"AL 400-1a", "KNM-ER 3230", "KNM-ER 729", "MLD 18/4/24", "MLD 40",
"NMT-W64-160", "NMT-W64-160", "SK 1587", "SK 1648", "SK 34",
"SK 34", "SK 843", "SK 858", "SK 876", "SK 876", "Stw 14", "TM 1517b",
"ARA-VP-1/128", "ARA-VP-1/128", "ARA-VP-1/200", "Stw 151", "Stw 151",
"Stw 566")), class = "data.frame", row.names = c(9L, 26L, 28L,
385L, 398L, 408L, 416L, 417L, 428L, 432L, 444L, 545L, 546L, 549L,
550L, 552L, 553L, 555L, 557L, 560L, 563L, 564L, 569L, 570L, 572L,
577L, 578L, 581L, 582L, 587L, 588L, 589L, 591L, 592L, 595L, 598L,
600L, 601L, 710L, 712L, 716L, 719L, 722L, 724L, 726L, 728L, 735L,
738L, 744L, 748L, 753L, 758L, 791L, 794L, 802L, 804L, 806L, 809L,
812L, 814L, 816L, 819L, 824L, 825L, 841L, 842L, 846L, 897L, 898L,
899L))
If we see the head(df)
:
head(df)
Species Site / Population Specimen
9 Paranthropus robustus Drimolen DNH 7
26 Paranthropus robustus Drimolen DNH 8
28 Paranthropus robustus Drimolen DNH 8
385 Australopithecus afarensis Nefuraytu: Woranso-Mille (Central Afar, Ethiopia) NFR-VP-1/29
398 Australopithecus afarensis Laetolil LH-2
408 Australopithecus afarensis Laetolil LH-3
First, we need to look at the first column (Species
). If the number of rows with one category (i.e. Homo habilis
) is less than 3 (which is the case), I would like to remove all the rows with Homo habilis
). Obviously, I would like to count the total number of rows per Species and check that their number is less than 3.
How could I do it?