0

Working in R, I have a dataframe (species_A) with species observed in several years and a lot of other information. I have a second dataframe (species_B) with species and the individual years they were sampled in.

I want to compare species_A with species_B so that species_A only contains species that are listed in species_B in the respective year.

This is how my data looks like:

Species_A:

ID__|__species _____________|___year

1___|__Diatoma vulgaris_______|___2005

2___|__Diatoma vulgaris_______|___2006

3___|__Nitzschia dissipata_____|___2006

4___|__Nitzschia palea________|___2007

The dataframe species_B is structured in the same way but does not contain all the rows of species_A.

This is the code I came up with if I would only want to compare if the species of species_A are contained in species_B. However, I want to group the species by year and then compare the dataframes.

species_A <- species_A[ species_A$species %in% species_B$species, ]

Can this possibly be done via dplyr?

gamel
  • 29
  • 6
  • You could `paste0` `species` and `year` together and see if those match (so for example `Diatoma vulgaris2005` will not match `Diatoma vulgaris2006`) – JDL Feb 01 '18 at 16:34
  • Possible duplicate of http://stackoverflow.com/questions/1299871 – zx8754 Feb 01 '18 at 16:58

2 Answers2

0

Can this possibly be done via dplyr?

Sure.

set.seed(256)
library(dplyr)

specA <- 
  data.frame(id = 1:10, 
             spec = sample(LETTERS[1:5], 10, TRUE), 
             year = sample(c(2000:2005), 10, TRUE)) %>%
  mutate()

specB <- 
  data.frame(id = 1:10, 
             spec = sample(LETTERS[1:5], 10, TRUE), 
             year = sample(c(2000:2005), 10, TRUE))

specA %>%
  inner_join(specB, by = c("spec", "year"))
Steven
  • 3,238
  • 21
  • 50
0

You should look up sub-setting if you want a visual comparison. Using the iris data set, this is how to approach the problem:

iris[iris$Species == "setosa",]

This gets all the rows where Species is equal to setosa. This is the basis for more advance"d subsetting

Now we create a second list to replicate your problem:

other_list <- iris[iris$Species %in% c("setosa", "virginica"),]

Now we filter the original list based on another:

iris[iris$Species %in% other_list$Species,]
Preston
  • 7,399
  • 8
  • 54
  • 84