How to aggregate multiple observations of presence/absence data within a location in R?

Question

I have mutltiple species presence observations from location split by collection time, but would like to have them for whether the species appeared in that location at any time. My data currently looks like this:

### Location   Collection_time   Species   Presence
#    loc1        6-8PM             Sp1        Y
#    loc1        6-8PM             Sp2        N
#    loc1        8-10PM            Sp1        N
#    loc1        8-10PM            Sp2        Y
#    loc1        10-12PM           Sp1        N
#    loc1        10-12PM           Sp2        N
#    loc2        6-8PM             Sp1        Y
#    loc2        6-8PM             Sp2        N
#    loc2        8-10PM            Sp1        N
#    loc2        8-10PM            Sp2        N
#    loc2        10-12PM           Sp1        N
#    loc2        10-12PM           Sp2        N

But what I would like to achieve is to have a new dataframe with one presence absence value by location, not by the collection time, so like:

### Location  Species   Presence
     loc1      Sp1          Y 
     loc1      Sp2          Y 
     loc2      Sp1          Y 
     loc2      Sp2          N

New to R and I don't have a strong enough grasp on it to work out how to achieve this yet, so stuck before the stage where I have reasonably lucid attempts at code. Thanks in advance for help!

score 3 · Accepted Answer · edited Feb 21 '23 at 22:50

3

A base R solution

aggregate(Presence ~ Location + Species, df, max, na.rm = T)

#   Location Species Presence
# 1     loc1     Sp1        Y
# 2     loc2     Sp1        Y
# 3     loc1     Sp2        Y
# 4     loc2     Sp2        N

You can use max() because max("Y", "N") returns "Y" because of the encoding.

edited Feb 21 '23 at 22:50

halfer

19,824
17
99
186

answered Jul 17 '20 at 10:32

Darren Tsai

32,117
5
21
51

1

Thank you! Really nice one line solution. In the interest of better understanding, why is Y encoded as greater than N? Is it an alphabetical value thing? – westpier Jul 17 '20 at 12:23
2

@westpier Take a look at https://stackoverflow.com/questions/37914917/using-max-function-on-character-vectors-in-r and https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extremes.html : "[...] Character versions are sorted lexicographically, [...]" – Martin Gal Jul 17 '20 at 12:46

score 1 · Answer 2 · answered Jul 17 '20 at 10:30

1

You could use dplyr, assuming your data is stored in a data.frame named df:

df %>%
  group_by(Location, Species) %>%
  summarise(Presence=ifelse(max(Presence=="Y")==1, "Y", "N"))

returns

  Location Species Presence
  <chr>    <chr>   <chr>   
1 loc1     Sp1     Y       
2 loc1     Sp2     Y       
3 loc2     Sp1     Y       
4 loc2     Sp2     N

answered Jul 17 '20 at 10:30

Martin Gal

16,640
5
21
39

Thank you! I had a feeling it there would be a solution with group_by and ifelse but I really need to get a better grasp of the syntax! Really appreciated. – westpier Jul 17 '20 at 10:32
Mindblowing fact: As Darren Tsai pointed out, you can replace that whol `ifelse()`-part with `max(Presence)`. – Martin Gal Jul 17 '20 at 11:19

How to aggregate multiple observations of presence/absence data within a location in R?

2 Answers2