1

I’m currently working on some ecological research data and tried to do this for hours now. I have a data frame similar, but much larger to this one:

beetles <- data.frame(Area=c("A","A","A","B","B","B","C","C","D","D","D","D"),
                             Year =c(1993, 1994, 1994, 1994,1995, 1995, 1996,1997,1998,1997,1996,1996),species=c("Harpalus latus","Amara ovata","Harpalus latus","Dromius agilis","Amara ovata","Harpalus latus","Amara ovata","Harpalus latus","Harpalus latus","Amara ovata","Dromius agilis","Harpalus latus"),                                                                                                       field_season= c(1,2,2,1,2,2,1,2,3,2,1,1))

What I want to do is this: I have beetle data for 4 research areas, sampled over a range of years. For the analysis, I need a column with the number of field season per research area (field_season) each species was caught. I'm looking for the column named "field_season" which is currently not in my data.frame. To give a bit more context: For the analysis I want to split my data set and see how much the beetle communities differed over the field seasons carried out.

I tried to use:

beetles %>% group_by(Area) %>% mutate(field_season = year ?)

but can't figure out how to do this. Please, if anyone can point me in the right direction, that would be very much appreciated.

Conny
  • 41
  • 6

3 Answers3

2
beetles %>% 
    dplyr::group_by(Area) %>% 
    dplyr::summarise(sum_season = sum(field_season)) %>% 
    dplyr::left_join(beetles)

Like this?

Joining, by = "Area"
# A tibble: 12 x 5
   Area  sum_season  Year species        field_season
   <chr>      <dbl> <dbl> <chr>                 <dbl>
 1 A              5  1993 Harpalus latus            1
 2 A              5  1994 Amara ovata               2
 3 A              5  1994 Harpalus latus            2
 4 B              5  1994 Dromius agilis            1
 5 B              5  1995 Amara ovata               2
 6 B              5  1995 Harpalus latus            2
 7 C              3  1996 Amara ovata               1
 8 C              3  1997 Harpalus latus            2
 9 D              7  1998 Harpalus latus            3
10 D              7  1997 Amara ovata               2
11 D              7  1996 Dromius agilis            1
12 D              7  1996 Harpalus latus            1
user438383
  • 5,716
  • 8
  • 28
  • 43
2

You can use dense_rank from dplyr :

library(dplyr)
beetles %>% group_by(Area) %>% mutate(field_season_ans = dense_rank(Year))

#   Area   Year species        field_season field_season_ans
#   <chr> <dbl> <chr>                 <dbl>            <int>
# 1 A      1993 Harpalus latus            1                1
# 2 A      1994 Amara ovata               2                2
# 3 A      1994 Harpalus latus            2                2
# 4 B      1994 Dromius agilis            1                1
# 5 B      1995 Amara ovata               2                2
# 6 B      1995 Harpalus latus            2                2
# 7 C      1996 Amara ovata               1                1
# 8 C      1997 Harpalus latus            2                2
# 9 D      1998 Harpalus latus            3                3
#10 D      1997 Amara ovata               2                2
#11 D      1996 Dromius agilis            1                1
#12 D      1996 Harpalus latus            1                1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks that's almost perfect. But it does seem to only work, if I already have a column named "field season" which I don't have in my original data set.. – Conny Aug 06 '20 at 10:23
  • @Conny No. I don't use `field_season` anywhere in any answer. You don't need it in your dataset. I kept your original column `field_season` just for comparison purpose. – Ronak Shah Aug 06 '20 at 10:25
  • Ok, sorry you are right, but if I use str(beetles), the new column is not shown? – Conny Aug 06 '20 at 10:29
  • ok I think I found the solution: beetles<-beetles%>% group_by(Area) %>% mutate(field_season = dense_rank(Year)) . Thanks for your help! – Conny Aug 06 '20 at 10:46
1

I am not sue if you just want to count by Area only or by both Area and Year

  • Grouping by Area
> within(beetles, counts <- ave(field_season,Area,FUN = sum))
   Area Year        species field_season counts
1     A 1993 Harpalus latus            1      5
2     A 1994    Amara ovata            2      5
3     A 1994 Harpalus latus            2      5
4     B 1994 Dromius agilis            1      5
5     B 1995    Amara ovata            2      5
6     B 1995 Harpalus latus            2      5
7     C 1996    Amara ovata            1      3
8     C 1997 Harpalus latus            2      3
9     D 1998 Harpalus latus            3      7
10    D 1997    Amara ovata            2      7
11    D 1996 Dromius agilis            1      7
12    D 1996 Harpalus latus            1      7
  • Grouping by Area + Year
> within(beetles, counts <- ave(field_season,Area,Year, FUN = sum))
   Area Year        species field_season counts
1     A 1993 Harpalus latus            1      1
2     A 1994    Amara ovata            2      4
3     A 1994 Harpalus latus            2      4
4     B 1994 Dromius agilis            1      1
5     B 1995    Amara ovata            2      4
6     B 1995 Harpalus latus            2      4
7     C 1996    Amara ovata            1      1
8     C 1997 Harpalus latus            2      2
9     D 1998 Harpalus latus            3      3
10    D 1997    Amara ovata            2      2
11    D 1996 Dromius agilis            1      2
12    D 1996 Harpalus latus            1      2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Sorry, for the confusion. I'm actually looking for a way to show me the field season in an extra column for each species sampled in my four research areas. – Conny Aug 06 '20 at 09:40