1

Here's the sample data: Link. Gregor suggested:

id <- c(10020,10020,10020,10020,10020,10020,10020,10020,10021,10021,10021,10021,
        10021,10021,10022,10022,10022,20020,20020,20020,20020,20020,20020,20021,
        20021,20021)
family_id<- c(1002,1002,1002,1002,1002, 1002, 1002, 1002, 1002, 1002, 1002, 1002,
             1002, 1002, 1002, 1002, 1002, 2002, 2002, 2002, 2002, 2002, 2002, 
             2002, 2002, 2002 )
child_id<- c(NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 2, 2, 2, 2, NA, NA, 
            NA, NA, NA, NA, 1, 1, 1 )
year<- c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005,
         2006, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005, 2006, 2007, 2004,
         2005, 2006 ) 
number_of_children<- c(2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2,
                       2, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0 )
child1_birthyear<- c(1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 2001, 2001,
                     2001, 2001, 2001, NA, NA, NA, NA, 1990, 1990, 1990, 1990, 
                     1990, 1990, NA, NA, NA )
child2_birthyear<- c(1984, 1984, 1984, 1984, 1984, 1984, 1984, 1984, 2004, 2004,
                     2004, 2004, 2004, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                     NA, NA, NA )
sample<- data.frame(id,family_id,child_id,year,number_of_children,
                    child1_birthyear,child2_birthyear) 

Here's the data of a family with ID 1002. The last digit of id indicates whether the data is from the child: If the last digit of id is 0, then the data comes from parents. If the last digit of id >0, then the data comes from a child of the parents. In the sample, 10020 is parents' data and 10021, 10022 are child #1 (born in 1980) and #2 (born in 1984)'s data.

As you can see, the parents have two children, 10021 and 10022. Child 10021 has 2 children as of 2006. Their first child was born in 2001 and their second was born in 2004. Child 10022 doesn't have children.

I want to mutate two columns: grandparent and became_grandparent. Specifically, grandparent is an indicator and is 1 if the parents are grandparents. became_grandparent is the year that the parents became grandparents, that is, the earliest year that the parents' children had their children.

How do I do this using dplyr or other methods? Thanks!

I have tried

sample<-sample%>%
  arrange(year,id)%>%
  group_by(family_id)%>%
  mutate(
    became_grandparent=min(tail(child1_birthyear,-1),na.rm=TRUE)
  )%>%
  arrange(id,year)

But it doesn't work.

  • 1
    You'll get help a lot quicker if you put sample data in the question in a copy/pasteable way instead of in a link. It also makes your question a better resource for future users as that link good good bad at any time. – Gregor Thomas Apr 10 '23 at 18:25
  • @GregorThomas The link is to a .csv file. It would take a long time to code into a dataframe. read.csv() in this case is faster. – Ludwig Gershwin Apr 10 '23 at 18:29
  • `dput(your_data[1:5, ])` makes copy/pasteable code to recreate the first 5 rows of `your_data`, including the class information for each column. You can read more at our FAQ on [How to make a great reproducible example in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Gregor Thomas Apr 10 '23 at 19:11

0 Answers0