1

I'm trying to create a new variable that indicates whether an event has occurred for a participant within the expected year. Please find below a sample data frame df_raw. ID is the code of the participants, chil.int indicates within how many years one expect the first child, event indicates that childbirth has occurred, year indicates the year.

I thought about a variable that in 1 if the value in year + the value in chil.int is identical to the year value in the row where event == 1. This variable should be 0 if this is not the case.

In the data frame below, for individual A and B, there should be 1's in this new column but for individual C there should be 0's. Every participant who at least once expected an event accurately should get a 1. See df_new.

Does anyone know how this could be achieved? Or do you have other ideas how to solve this issue?

Tanks a lot!

Raw data frame:

`df_raw <- read.table(text="
                              ID  chil.int  event  year 
                 row.name11    A     3       0     2013   
                 row.name12    A     2       0     2014   
                 row.name13    A     1       0     2015  
                 row.name14    A     4       1     2016 
                 row.name15    A     3       0     2017   
                 row.name16    A     2       0     2018
                 row.name17    B     5       0     2010  
                 row.name18    B     4       0     2011   
                 row.name19    B     3       0     2012   
                 row.name20    B     2       0     2013
                 row.name21    B     NA      1     2015
                 row.name22    C     1       0     2015
                 row.name23    C     1       0     2016
                 row.name24    C     NA      0     2017
                 ",header=T)`

df_new is how I would like the final data frame to look like.

`df_new <- read.table(text="
                          ID  chil.int  event  year   new.col
             row.name11    A     3       0     2013   1 
             row.name12    A     2       0     2014   1
             row.name13    A     1       0     2015   1
             row.name14    A     4       1     2016   1
             row.name15    A     3       0     2017   1
             row.name16    A     2       0     2018   1
             row.name17    B     5       0     2010   1
             row.name18    B     4       0     2011   1
             row.name19    B     3       0     2012   1
             row.name20    B     2       0     2013   1
             row.name21    B     NA      1     2015   1
             row.name22    C     1       0     2015   0
             row.name23    C     1       0     2016   0
             row.name24    C     NA      0     2017   0
             ",header=T)`
Marie B.
  • 119
  • 1
  • 7
  • 1
    The logic is not clear,at least to me. Could you elaborate further on this calculation that checks for equality between child.int and year? – NelsonGon Jan 18 '19 at 11:03

2 Answers2

0

Assuming I have understood the logic correctly then here is a data.table solution.

Rephrasing the logic: is that if an individual (identified by ID) ever has chil.int + year %in% year[event == 1] then all his/her rows get 1 in new.col. That if any of the year + chil.int is equal to any of the year where event happens (although in this example even happens max once for every ID).

library(data.table)
setDT(df_raw)
df_raw[, new.col := as.integer(any((chil.int + year) %in% year[event == 1])), by = ID]
df_raw

    ID chil.int event year new.col
 1:  A        3     0 2013       1
 2:  A        2     0 2014       1
 3:  A        1     0 2015       1
 4:  A        4     1 2016       1
 5:  A        3     0 2017       1
 6:  A        2     0 2018       1
 7:  B        5     0 2010       1
 8:  B        4     0 2011       1
 9:  B        3     0 2012       1
10:  B        2     0 2013       1
11:  B       NA     1 2015       1
12:  C        1     0 2015       0
13:  C        1     0 2016       0
14:  C       NA     0 2017       0
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

This is long and I'm late to the party but here goes: The logic for C isn't clear. So I used a different approach

yrs<-strsplit(as.character(df_raw$year), "")
Yrs1<-matrix(unlist(yrs),byrow = T,ncol=4)
str(Yrs1)
Yrs1<-as.data.frame(Yrs1) %>%  
  mutate_if(is.character,as.numeric) %>% 
  mutate(ID2=as.factor(row_number()))
df_raw<-df_raw %>% 
  mutate(ID2=as.factor(row_number()))
df_raw%>% 
  left_join(Yrs1) %>% 
  mutate_if(is.factor,as.character) %>% 
  mutate(V1=as.numeric(V1),V2=as.numeric(V2),V3=as.numeric(V3),V4=as.numeric(V4),
         Sum=V1+V2+V3+V4+chil.int,Sum2=V1+V2+V3+V4) %>% 
  select(-ID2,-starts_with("V")) %>% 
  mutate(event=ifelse(Sum2+chil.int==Sum&ID%in%c("A","B"),1,0))
   #%>% 


#select(-Sum,-Sum2)

Output:

          ID1 ID chil.int event year Sum Sum2
1  row.name11  A        3     1 2013   9    6
2  row.name12  A        2     1 2014   9    7
3  row.name13  A        1     1 2015   9    8
4  row.name14  A        4     1 2016  13    9
5  row.name15  A        3     1 2017  13   10
6  row.name16  A        2     1 2018  13   11
7  row.name17  B        5     1 2010   8    3
8  row.name18  B        4     1 2011   8    4
9  row.name19  B        3     1 2012   8    5
10 row.name20  B        2     1 2013   8    6
11 row.name21  B       NA    NA 2015  NA    8
12 row.name22  C        1     0 2015   9    8
13 row.name23  C        1     0 2016  10    9
14 row.name24  C       NA     0 2017  NA   10
NelsonGon
  • 13,015
  • 7
  • 27
  • 57