Joining two datasets and replicating observations

Question

I have two data sets which needs to be joined and observations to be duplicated to avoid NAs as I'm going to run a regression on the data. However, I'm struggling to make it work.

I currently have two data sets similar to these:

Children

Participant_id   Family_no  Score_c   Gender_c
A1               1          300         .5
B1               1          400        -.5
C1               2          500        -.5
D1               2          450         .5
E1               2          600         .5
F1               3          350        -.5

Parents

Participants_id  Family_no  Score_p  Gender_p  Q_score
A2               1          200        .5       3
B2               1          350       -.5       3.5
C2               2          300        .5       2
D2               3          250       -.5       3.9
E2               3          300       -.5       4

I would like to join them together to create a data set where each child is represented by each parent in a family. E.g if a family has two parents and one child, the child's data is represented twice and vice versa, and if there are two parents and two children, each observation exists twice per family. I.e. like this (the participant column is not necessary):

Participant_id  Family_no  Score_c  Score_p  Gender_c  Gender_p  Q_score 
A1+A2           1          300      200      .5        .5        3
A1+B2           1          300      350      .5       -.5        3.5
B1+A2           1          400      200     -.5        .5        3
B1+B2           1          400      350     -.5       -.5        3.5
C1+C2           2          500      300     -.5        .5        2
D1+C2           2          450      300      .5        .5        2
E1+C2           2          600      300      .5        .5        2
F1+D2           3          350      250     -.5       -.5        3.9
F1+E2           3          350      300     -.5       -.5        4

I'd ideally like to use tidyverse but am open to other suggestions!

Thanks in advance,

Julia

This is just a `full_join` using `dplyr`. Use `full_join(children, parents, by='Family_no')` — MrFlick, Mar 06 '18 at 17:12

Joining two datasets and replicating observations

0 Answers0