0

I have two data frames; 1) dat - has >8000 observations; and 2) abun_dat - has <300 observations. Each observation in dat represents measurements on an individual animal at a specific site in a specific year and season. Each observation in abun_dat represents measurements at the site level.

I would like to append one particular column from abun_dat onto dat. For example:

dat:
      site year season
18 Avalon 2014 Winter
19 Avalon 2014 Winter
20 Avalon 2014 Winter
21 Avalon 2014 Spring
22 Avalon 2014 Spring
23 Avalon 2014 Spring

plus

abun_dat:
    site year season ave_ajust_abun
1 Avalon 2014 Winter       26.79167
2 Avalon 2014 Spring       42.79167
3 Avalon 2015 Summer       52.00000
4 Avalon 2015 Autumn       20.30769
5 Avalon 2015 Winter       29.79487
6 Avalon 2015 Spring       42.25641

would become a dataframe with the same length as dat, but with the appended column from abun_dat - like below:

      site year season ave_ajust_abun
18 Avalon 2014 Winter 26.79167
19 Avalon 2014 Winter 26.79167
20 Avalon 2014 Winter 26.79167
21 Avalon 2014 Spring 42.79167
22 Avalon 2014 Spring 42.79167
23 Avalon 2014 Spring 42.79167

I tried to do this using an ifelse() statement, see below code, but it will not work due to datasets being of different lengths. I also tried to do this in a for loop, with no success.

dat$ave_ajust_abun <- ifelse(dat$chr.site == abun_dat$chr.site & dat$year == abun_dat$year & dat$chr.season == abun_dat$chr.season, abun_dat$ave_ajust_abun, NA)

Dharman
  • 30,962
  • 25
  • 85
  • 135
Pat Taggart
  • 321
  • 1
  • 9
  • `ifelse` needs all arguments to have the same length. Also, `==` works elementwise comparison. So, if one of the datasets is of different length, it recycles the values of the shorter one. May be use `%in%` instead of `==` – akrun Nov 10 '19 at 23:05
  • Maybe `merge` is better here which can help you to compare and combine values from multiple columns, Use `merge(dat, abun_dat)` – Ronak Shah Nov 10 '19 at 23:41
  • sounds like a job for `dplyr::left_join` – Simon Woodward Nov 10 '19 at 23:43
  • I think this is more than just a join though. Each value of the site level variable in abun_dat must be joined to dat, but also must be replicated for all cases where sites, years and seasons match. For example, there could be up to 20 cases in dat where sites, years and seasons match those in abun_dat - I need the site level variable populated for all of these cases in dat. – Pat Taggart Nov 11 '19 at 00:00
  • I found a solution: ```new_dat <- merge(dat, abun_dat[, c("chr.site", "year", "chr.season", "ave_ajust_abun")], all = TRUE, by = c("chr.site", "year", "chr.season"))``` This replicates values from ```abun_dat$ave_ajust_abun``` to all cases in ```dat``` where sites, years and seasons match – Pat Taggart Nov 11 '19 at 00:46

0 Answers0