0

I have a data frame of this form

familyid  Year  memberid value
1         2000  1        5
1         2000  2        6
2         2000  1        5
3         2000  1        7
3         2000  2        8
1         2002  1        5
1         2002  2        5
2         2002  1        6
3         2002  1        7
3         2002  2        8

I want to transform it in the following way

familyid  Year  value_1 value_2
1         2000   5       6       
2         2000   5       NA
3         2000   7       8
1         2002   5       5
2         2002   6       NA
3         2002   7       8

In other words I want to group my obs by familyid and year and then, for each memberid, create a column reporting the corresponding value of the last column. Whenever that family has only one member, I want to have NA in the value_2 column associated with member 2 of the reference family.

To do this I usually and succesfully use the following code

setDT(df)
dfnew<-data.table::dcast(df, Year + familyid ~ memberid, value.var=c("value"))

Unfortunately this time I get something like this

familyid  Year  value_1 value_2
1         2000   1       1       
2         2000   1       0
3         2000   1       1
1         2002   1       1
2         2002   1       0
3         2002   1       1

In other words I get a new dataframe with 1 whenever the member exists (indeed column value_1 contains all 1 since all families have at least one member), 0 whenever the member does not exist, regardless the actual value in column "value". Does anybody know why this happens? Thank you for your time.

Marco Mello
  • 175
  • 1
  • 9
  • I cannot reproduce this with your example. I get your desired result. I suspect there are ID combinations that are not unique in your actual data. – Roland Oct 17 '18 at 11:13
  • Oh I see. This may be the problem. Thank you so much! – Marco Mello Oct 17 '18 at 11:15
  • Related / possible duplicate: [*dcast error: ‘Aggregation function missing: defaulting to length’*](https://stackoverflow.com/q/33051386/2204410) – Jaap Oct 17 '18 at 11:17

1 Answers1

0

With tidyverse:

 library(tidyverse)
 df<-read.table(text="familyid  Year  memberid value
 1         2000  1        5
 1         2000  2        6
 2         2000  1        5
 3         2000  1        7
 3         2000  2        8
 1         2002  1        5
 1         2002  2        5
 2         2002  1        6
 3         2002  1        7
 3         2002  2        8",header=T)

 df%>%
   group_by(familyid,Year)%>%
   spread(memberid,value)%>%
   arrange(Year)%>%
   mutate_at(c("1", "2"),.funs = funs( ifelse(is.na(.),0,1)))
# A tibble: 6 x 4
# Groups:   familyid, Year [6]
  familyid  Year   `1`   `2`
     <int> <int> <dbl> <dbl>
1        1  2000    1.    1.
2        2  2000    1.    0.
3        3  2000    1.    1.
4        1  2002    1.    1.
5        2  2002    1.    0.
6        3  2002    1.    1.
jyjek
  • 2,627
  • 11
  • 23