Determine for which ID are all conditions satisfied in R

Question

I think this should be relatively simple. I am using the latest release of R. In a data frame, I have a column with ID numbers called PairID and a column called species with 15 different species. I want to know which PairID numbers have all 15 species.

The data frame looks something like

head(analysis.df)
species     PairID
DIKDIK        1
GAZELLE       2
GIRAFFE       1
ELAND         5
GIRAFFE       3
DIKDIK        2

my idea was to run this:

    for(i in 1:nrow(analysis.df)) {
  if (analysis.df$species[i]=="GRANTS GAZELLE") {analysis.df$GRANTS GAZELLE[i] <- 1}
  else if (analysis.df$species[i]=="DIKDIK") {analysis.df$DIKDIK[i] <- 1 
  else if (analysis.df$species[i]=="IMPALA") {analysis.df$IMPALA[i] <- 1}
  else if (analysis.df$species[i]=="BUFFALO") {analysis.df$BUFFALO[i] <- 1}
  else if (analysis.df$species[i]=="BUSHBUCK") {analysis.df$BUSHBUCK[i] <- 1}
  else if (analysis.df$species[i]=="GIRAFFE") {analysis.df$GIRAFFE[i] <- 1}
  else if (analysis.df$species[i]=="ELAND") {analysis.df$ELAND[i] <- 1}
  else if (analysis.df$species[i]=="GERENUK") {analysis.df$GERENUK[i] <- 1}
  else if (analysis.df$species[i]=="LESSER KUDU") {analysis.df$LESSER KUDU[i] <- 1}
  else if (analysis.df$species[i]=="HARTEBEEST") {analysis.df$HARTEBEEST[i] <- 1}
  else if (analysis.df$species[i]=="STEENBOK") {analysis.df$STEENBOK[i] <- 1}
  else if (analysis.df$species[i]=="ORYX") {analysis.df$ORYX[i] <- 1}
  else if (analysis.df$species[i]=="REEDBUCK") {analysis.df$REEDBUCK[i] <- 1}
  else if (analysis.df$species[i]=="THOMSONS GAZELLE") {analysis.df$THOMSONS GAZELLE[i] <- 1}
  else if (analysis.df$species[i]=="WATERBUCK") {analysis.df$WATERBUCK[i] <- 1}

}

Then I could try summary for all rows with a 1 in all of these newly created columns.

But this code gives the error:

> Error: unexpected symbol in:
"for(i in 1:nrow(analysis.df)){
  if (analysis.df$species[i]=="GRANTS GAZELLE") {analysis.df$GRANTS GAZELLE"

I have looked here and here plus some vignettes in R and google searches but haven't been able to crack it so far. I am not even sure this method would give me what I want and would happily take a look at any suggestions to achieve the goal initially stated at the beginning of this post.

score 1 · Accepted Answer · answered Mar 13 '19 at 22:20

It sounds like what you want to do is group your data by ID and then summarize the members of species based on a condition. Since you don't provide a reproducible example, I'll use mtcars. Here we group by number of gears, and then check whether the carb column contains all the provided values (1, 2, 3, and 4):

library(dplyr)
mtcars %>%
    group_by(gear) %>%
    summarize(all_carb = all(c(1,2,3,4) %in% carb))

# A tibble: 3 x 2
   gear all_carb
  <dbl> <lgl>   
1     3 TRUE    
2     4 FALSE   
3     5 FALSE

In your case, you'd do something like:

analysis.df %>%
    group_by(ID) %>%
    summarize(all_species = all(species_list %in% species))

assuming species_list is a vector containing the values of species you want to check for

This was perfect and efficient to boot – Nebulloyd Mar 14 '19 at 16:05 — Nebulloyd, Mar 14 '19 at 16:05

ThomasPepperz · Answer 2 · 2019-03-13T22:21:22.167

0

Try this:

dplyr::filter(analysis.df, nrow(analysis.df$PairID) > 14)

Be sure to install the dplyr package if it's not already installed and loaded.

In the code that you wrote, you'll need to include backticks for any column name that has a space in the middle: "dataframe$`Column with a space`" (no quotation marks included)

edited Mar 13 '19 at 22:21

answered Mar 13 '19 at 22:15

ThomasPepperz

176
11

Very useful input on how to treat spaces. Alas, the answer above was sufficient and easier to asses the final outcome (i.e. which IDs had the full complement of species). – Nebulloyd Mar 14 '19 at 16:11

Determine for which ID are all conditions satisfied in R

2 Answers2