1

I am working with productivity data in a bird species. I would like to include pair experience, defined as the total number of clutches laid to date, as a variable to investigate if it has an effect on productivity.

To do this I need to count the total number of clutches laid before the current clutch for each pair.

Here is the data I am working with:

   Pair.ID    laydate
1  GGM 022       <NA>
2  GGM 022       <NA>
3  GGM 022       <NA>
4  GGM 019 26/03/2017
5  GGM 019       <NA>
6  GGM 019       <NA>
7  GGM 013 18/03/2017
8  GGM 021       <NA>
9  GGM 021       <NA>
10 GGM 021       <NA>
11 GGM 009 25/12/2016
12 GGM 009 14/01/2019
13 GGM 009 20/01/2019
14 GGM 029       <NA>
15 GGM 031 09/05/2019
16 GGM 031 19/06/2019

Here is what I want to get to:

   Pair.ID    laydate experience
1  GGM 022       <NA>         NA
2  GGM 022       <NA>         NA
3  GGM 022       <NA>         NA
4  GGM 019 26/03/2017          0
5  GGM 019       <NA>         NA
6  GGM 019       <NA>         NA
7  GGM 013 18/03/2017          0
8  GGM 021       <NA>         NA
9  GGM 021       <NA>         NA
10 GGM 021       <NA>         NA
11 GGM 009 25/12/2016          0
12 GGM 009 14/01/2019          1
13 GGM 009 20/01/2019          2
14 GGM 029       <NA>         NA
15 GGM 031 09/05/2019          0
16 GGM 031 19/06/2019          1

A few things: 1) I need to keep the rows with NA as they are where pairs have had the oppurtunity to breed but did not. 2) I would like to have the information added to the mother dataframe, rather than creating a summary dataframe. 3) I would like to use dplyr if possible

I have looked around and tried to wrangle these solutions to fit my purpose but could not get them to work as needed: Rolling Count of Events Over Time Series and Count events before a specific time for a series of items in R

TomCLewis
  • 145
  • 2
  • 10
  • It would help if you could present your data as a `data.frame` object rather than a table and so make your question reproducible. – Peter May 04 '20 at 19:48

1 Answers1

2

We could group by Pair.ID, specify the logical vector in i i.e. whereever there are non-NA elements in 'laydate' and create the new column 'experience' by assinging (:=) the sequence of rows

library(data.table)
setDT(df1)[!is.na(laydate), experience := seq_len(.N) - 1, Pair.ID][]
#    Pair.ID    laydate experience
# 1: GGM 022       <NA>         NA
# 2: GGM 022       <NA>         NA
# 3: GGM 022       <NA>         NA
# 4: GGM 019 26/03/2017          0
# 5: GGM 019       <NA>         NA
# 6: GGM 019       <NA>         NA
# 7: GGM 013 18/03/2017          0
# 8: GGM 021       <NA>         NA
# 9: GGM 021       <NA>         NA
#10: GGM 021       <NA>         NA
#11: GGM 009 25/12/2016          0
#12: GGM 009 14/01/2019          1
#13: GGM 009 20/01/2019          2
#14: GGM 029       <NA>         NA
#15: GGM 031 09/05/2019          0
#16: GGM 031 19/06/2019          1

Or with dplyr

library(dplyr)
df1 %>%
  group_by(Pair.ID) %>%
  mutate(experience = (row_number()-1) * (NA^(is.na(laydate))))

data

df1 <- structure(list(Pair.ID = c("GGM 022", "GGM 022", "GGM 022", "GGM 019", 
"GGM 019", "GGM 019", "GGM 013", "GGM 021", "GGM 021", "GGM 021", 
"GGM 009", "GGM 009", "GGM 009", "GGM 029", "GGM 031", "GGM 031"
), laydate = c(NA, NA, NA, "26/03/2017", NA, NA, "18/03/2017", 
NA, NA, NA, "25/12/2016", "14/01/2019", "20/01/2019", NA, "09/05/2019", 
"19/06/2019")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Great! Thank you so much, works just as I wanted. I knew it should only take a few lines of code. Cheers – TomCLewis May 04 '20 at 19:59