0

I think this should be relatively simple. I am using the latest release of R. In a data frame, I have a column with ID numbers called PairID and a column called species with 15 different species. I want to know which PairID numbers have all 15 species.

The data frame looks something like

head(analysis.df)
species     PairID
DIKDIK        1
GAZELLE       2
GIRAFFE       1
ELAND         5
GIRAFFE       3
DIKDIK        2

my idea was to run this:

    for(i in 1:nrow(analysis.df)) {
  if (analysis.df$species[i]=="GRANTS GAZELLE") {analysis.df$GRANTS GAZELLE[i] <- 1}
  else if (analysis.df$species[i]=="DIKDIK") {analysis.df$DIKDIK[i] <- 1 
  else if (analysis.df$species[i]=="IMPALA") {analysis.df$IMPALA[i] <- 1}
  else if (analysis.df$species[i]=="BUFFALO") {analysis.df$BUFFALO[i] <- 1}
  else if (analysis.df$species[i]=="BUSHBUCK") {analysis.df$BUSHBUCK[i] <- 1}
  else if (analysis.df$species[i]=="GIRAFFE") {analysis.df$GIRAFFE[i] <- 1}
  else if (analysis.df$species[i]=="ELAND") {analysis.df$ELAND[i] <- 1}
  else if (analysis.df$species[i]=="GERENUK") {analysis.df$GERENUK[i] <- 1}
  else if (analysis.df$species[i]=="LESSER KUDU") {analysis.df$LESSER KUDU[i] <- 1}
  else if (analysis.df$species[i]=="HARTEBEEST") {analysis.df$HARTEBEEST[i] <- 1}
  else if (analysis.df$species[i]=="STEENBOK") {analysis.df$STEENBOK[i] <- 1}
  else if (analysis.df$species[i]=="ORYX") {analysis.df$ORYX[i] <- 1}
  else if (analysis.df$species[i]=="REEDBUCK") {analysis.df$REEDBUCK[i] <- 1}
  else if (analysis.df$species[i]=="THOMSONS GAZELLE") {analysis.df$THOMSONS GAZELLE[i] <- 1}
  else if (analysis.df$species[i]=="WATERBUCK") {analysis.df$WATERBUCK[i] <- 1}

}

Then I could try summary for all rows with a 1 in all of these newly created columns.

But this code gives the error:

> Error: unexpected symbol in:
"for(i in 1:nrow(analysis.df)){
  if (analysis.df$species[i]=="GRANTS GAZELLE") {analysis.df$GRANTS GAZELLE"

I have looked here and here plus some vignettes in R and google searches but haven't been able to crack it so far. I am not even sure this method would give me what I want and would happily take a look at any suggestions to achieve the goal initially stated at the beginning of this post.

Nebulloyd
  • 264
  • 1
  • 9

2 Answers2

1

It sounds like what you want to do is group your data by ID and then summarize the members of species based on a condition. Since you don't provide a reproducible example, I'll use mtcars. Here we group by number of gears, and then check whether the carb column contains all the provided values (1, 2, 3, and 4):

library(dplyr)
mtcars %>%
    group_by(gear) %>%
    summarize(all_carb = all(c(1,2,3,4) %in% carb))

# A tibble: 3 x 2
   gear all_carb
  <dbl> <lgl>   
1     3 TRUE    
2     4 FALSE   
3     5 FALSE   

In your case, you'd do something like:

analysis.df %>%
    group_by(ID) %>%
    summarize(all_species = all(species_list %in% species))

assuming species_list is a vector containing the values of species you want to check for

divibisan
  • 11,659
  • 11
  • 40
  • 58
0

Try this:

dplyr::filter(analysis.df, nrow(analysis.df$PairID) > 14)

Be sure to install the dplyr package if it's not already installed and loaded.

In the code that you wrote, you'll need to include backticks for any column name that has a space in the middle: "dataframe$`Column with a space`" (no quotation marks included)

ThomasPepperz
  • 176
  • 11
  • Very useful input on how to treat spaces. Alas, the answer above was sufficient and easier to asses the final outcome (i.e. which IDs had the full complement of species). – Nebulloyd Mar 14 '19 at 16:11