-1

Is there a way to obtain the annual count values based on the state, species, and year, without using a for loop?

Name    |  State   |  Age  | Species    | Annual Ct
Nemo    |  NY      |  5    | Clownfish  | ?
Dora    |  CA      |  2    | Regal Tang | ?

Lookup table:

State |  Species    | Year  | AnnualCt
NY    |  Clownfish  |  2012 |  500
NY    |  Clownfish  |  2014 |  200
CA    |  Regal Tang |  2001 |  400
CA    |  Regal Tang |  2014 |  680
CA    |  Regal Tang |  2000 |  700

The output would be:

Name  |  State |  Age | Species    | Annual Ct
Nemo  |  NY    |  5   | Clownfish  | 200
Dora  |  CA    |  2   | Regal Tang | 680

What I've tried:

pets <- data.frame("Name" = c("Nemo","Dora"), "State" = c("NY","CA"),
               "Age" = c(5,2), "Species" = c("Clownfish","Regal Tang"))
fishes <- data.frame("State" = c("NY","NY","CA","CA","CA"), 
                  "Species" = c("Clownfish","Clownfish","Regal Tang", 
                                "Regal Tang", "Regal Tang"),
                  "Year" = c("2012","2014","2001","2014","2000"),
                  "AnnualCt" = c("500","200","400","680","700"))

pets["AnnualCt"] <- NA

for (row in (1:nrow(pets))){
pets$AnnualCt[row] <- as.character(droplevels(fishes[which(fishes$State == pets[row,]$State & 
                                   fishes$Species == pets[row,]$Species & 
                                   fishes$Year == 2014), 
                           which(colnames(fishes)=="AnnualCt")]))
}
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 2
    Yep, this is a join or merge on `state/species/year` - see https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right for a whole host of info on joins. – thelatemail Mar 21 '18 at 01:47
  • Also https://stackoverflow.com/q/6112260/ for joins specifically realted to adding a column to an existing table. – Frank Mar 21 '18 at 01:48
  • The reproducible example which you have shared is different from the one which you have showed. – Ronak Shah Mar 21 '18 at 01:49
  • @RonakShah: typo thanks! –  Mar 21 '18 at 02:01
  • 1
    @Lemon The `Age` column in `pets` is still different. Moreover, how do you get answer as 200 for 1st row and why is the `State` name changed to "Clownfish" in expected output? – Ronak Shah Mar 21 '18 at 02:04
  • @thelatemail: This works but a bit convoluted. Would this be the best way?: pets <- merge(x = pets, y = fishes, by = "State", all.x = TRUE) pets <- pets[which(pets$Year == "2014"), ] pets <- pets[, -match("Species.x",names(pets))] –  Mar 21 '18 at 02:30
  • @Frank: Top answer uses a for loop if the values aren't unique (states here). I'll see if the others work –  Mar 21 '18 at 02:31
  • @RonakShah updated thanks! My output has 200 for the 'nemo' first row. Is your output different? –  Mar 21 '18 at 02:44

1 Answers1

2

I'm confused as to what you're trying to do; isn't this just this?

library(dplyr);
left_join(pets, fishes) %>%
    filter(Year == 2014) %>% 
    select(-Year);
#Joining, by = c("State", "Species")
#  Name State Age    Species AnnualCt
#1 Nemo    NY   5  Clownfish      200
#2 Dora    CA   2 Regal Tang      680

Explanation: left_join both data.frames by State and Species, filter for Year == 2014 and output without Year column.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68