0

I would like to find an efficient way to split and merge back my data. I want to split my data by sex and merged it back by household id (idhouse). I am doing this to get a paired database (so, I want to remove all the "singles").

What I have is this

   idhouse idperso   sex score
1       1       1   man    18
2       1       2 woman    22
3       2       1   man    19
4       2       2 woman    24
5       3       1 woman    30

want I want is this :

 idhouse idperso_man score_man idperso_woman score_woman
     1           1        18             2          22
     2           1        19             2          24

I am having a mistake using map and I can't figure what it is

library(dplyr)
library(purrr) 

dt %>% split(.$sex) %>% 
   map_df(~merge(., .$man, .$woman, by = 'idhouse'))

data

dt = structure(list(idhouse = structure(c(1L, 1L, 2L, 2L, 3L), .Label = c("1", 
"2", "3"), class = "factor"), idperso = structure(c(1L, 2L, 1L, 
2L, 1L), .Label = c("1", "2"), class = "factor"), sex = structure(c(1L, 
2L, 1L, 2L, 2L), .Label = c("man", "woman"), class = "factor"), 
score = structure(c(1L, 3L, 2L, 4L, 5L), .Label = c("18", 
"19", "22", "24", "30"), class = "factor")), .Names = c("idhouse", 
"idperso", "sex", "score"), row.names = c(NA, -5L), class = "data.frame")
giac
  • 4,261
  • 5
  • 30
  • 59
  • Hey @giacomoV, see this post http://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr. This is a dupe which should be closed – Jacob H Sep 19 '16 at 20:22
  • @DavidArenburg, haha I justed posted an answer of yours as the dupe. – Jacob H Sep 19 '16 at 20:24
  • @JacobH yeah, totally forgot about that one :) – David Arenburg Sep 19 '16 at 20:25
  • Im not sure about the duplicate because it's not really about spreading but rather on merging in order to clean and pair a dataset. I have to merge one way or the other. No? – giac Sep 19 '16 at 21:03
  • @giacomoV the code in the link I provided above generates your desired output, so I believe it is a dupe. However I could be wrong, why do you want to merge? – Jacob H Sep 19 '16 at 21:20
  • Because people from the same household must be matched and when there is a singleton (like `id == 3` in my example) then he must be delete. I don't understand how the code proposed solve this. Thanks – giac Sep 19 '16 at 21:41
  • 1
    @giacomoV, yes you're right you can use merge, to merge on `idhouse`, however, it is not necessary. When you reshape the data from long to wide format, `idhouse` will be matched. As for your second point, the post which I've provided you will not remove the singletons. However, removing the singletons is trivial. For example, if you're using the `tidyr` approach in the post you simply need to include `%>% slice(complete.cases(.))` at the end of the chain. – Jacob H Sep 19 '16 at 23:38
  • 1
    If you want to see how this is a dupe, here is the `data.table` solution from the mentioned dupes + `complete.cases` at the end `library(data.table) ; res <- dcast(setDT(dt), idhouse ~ sex, value.var = c("idperso", "score")) ; res[complete.cases(res)]`. You can do the same with dplyr/tidyr or reshape2 and add `complete.cases` at the end – David Arenburg Sep 20 '16 at 06:01
  • ok thank you very much. – giac Sep 20 '16 at 08:41

0 Answers0