Here's the sample data: Link. Gregor suggested:
id <- c(10020,10020,10020,10020,10020,10020,10020,10020,10021,10021,10021,10021,
10021,10021,10022,10022,10022,20020,20020,20020,20020,20020,20020,20021,
20021,20021)
family_id<- c(1002,1002,1002,1002,1002, 1002, 1002, 1002, 1002, 1002, 1002, 1002,
1002, 1002, 1002, 1002, 1002, 2002, 2002, 2002, 2002, 2002, 2002,
2002, 2002, 2002 )
child_id<- c(NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 2, 2, 2, 2, NA, NA,
NA, NA, NA, NA, 1, 1, 1 )
year<- c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005,
2006, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005, 2006, 2007, 2004,
2005, 2006 )
number_of_children<- c(2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2,
2, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0 )
child1_birthyear<- c(1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 2001, 2001,
2001, 2001, 2001, NA, NA, NA, NA, 1990, 1990, 1990, 1990,
1990, 1990, NA, NA, NA )
child2_birthyear<- c(1984, 1984, 1984, 1984, 1984, 1984, 1984, 1984, 2004, 2004,
2004, 2004, 2004, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA )
sample<- data.frame(id,family_id,child_id,year,number_of_children,
child1_birthyear,child2_birthyear)
Here's the data of a family with ID 1002. The last digit of id
indicates whether the data is from the child: If the last digit of id
is 0, then the data comes from parents. If the last digit of id
>0, then the data comes from a child of the parents. In the sample, 10020 is parents' data and 10021, 10022 are child #1 (born in 1980) and #2 (born in 1984)'s data.
As you can see, the parents have two children, 10021 and 10022. Child 10021 has 2 children as of 2006. Their first child was born in 2001 and their second was born in 2004. Child 10022 doesn't have children.
I want to mutate two columns: grandparent
and became_grandparent
. Specifically, grandparent
is an indicator and is 1 if the parents are grandparents. became_grandparent
is the year that the parents became grandparents, that is, the earliest year that the parents' children had their children.
How do I do this using dplyr
or other methods? Thanks!
I have tried
sample<-sample%>%
arrange(year,id)%>%
group_by(family_id)%>%
mutate(
became_grandparent=min(tail(child1_birthyear,-1),na.rm=TRUE)
)%>%
arrange(id,year)
But it doesn't work.