0

I guess this is a basic question but I am not sure what I am doing wrong. Consider the following (smalle subsample) of a dataset of only same sex individuals.

df<-data.frame( c(20050401, 20050401, 20100101, 20120401, 20100101, 20120401), c(20100101, 20100101, 20181201, 20150401, 20181201, 20150401), c("A1", "A1", "A1", "A2", "A1", "A2"),
                c("q1", "q2", "q1", "q3", "q2", "q4"), c(1,1,1, 1, 1, 1))

colnames(df)<-c("begindate", "enddate", "householdnr", "id", "sex-dummy")

Each household number corresponds to two individuals which are given by the id variable. Now each couple of two ids is identified by both the householdnumber and the begindate (see for instance household nr A1 which has multiple entries) Now I would like to rewrite this dataset in the following format

householdnr id_1 id_2 begindate    enddate    sex-dummy
A1           q1   q2   20050401    20100101       1
A1           q1   q2   20100101    20181201       1   
A2           q3   q4   20120401    20150401       1

I think I should use the pivot wider command here, but for some reason I cannot obtain the right dataset as displayed above. I think my mistake is in that the identifying variables are both the householdnr and the begindate for each household, but I am not sure. I appreciate any help.

Albert
  • 119
  • 7

1 Answers1

1

You could create a row number for each couple marked by their householdnr and begindate and then use pivot_wider.

library(dplyr)

df %>%
  group_by(householdnr, begindate) %>%
  mutate(row  =row_number()) %>%
  tidyr::pivot_wider(names_from = row, values_from = id, names_prefix = 'id_')


#  begindate  enddate householdnr `sex-dummy` id_1  id_2 
#      <dbl>    <dbl> <chr>             <dbl> <chr> <chr>
#1  20050401 20100101 A1                    1 q1    q2   
#2  20100101 20181201 A1                    1 q1    q2   
#3  20120401 20150401 A2                    1 q3    q4   
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213