I think this is a relatively easy and straightforward question. Or at least I hope so.
I am trying to merge some data together, Dataset A and Dataset B. I am joining the data together by a column that is in both datasets. This called is a list of participant numbers and is called part_no
. However while my line of code below works in terms of merging Datasets A and B together, I end up losing some of my rows and hence some of my participants.
I am relatively confident (albeit not 100% sure) this may be because while the part_no column is common to both datasets, not all participants are included in each (i.e. some participants are in Dataset A, some are in Dataset B, and some are in both). This being the case, I still want the merge to go ahead with R filling in any missing values with NA or something similar.
My line of code is
merged_data = merge(DatasetA, DatasetB, by = "part_no")
Can anyone suggest a quick fix?
Thanks.
Edit: Each participant has multiple rows. The line merged_data = merge(DatasetA, DatasetB, by = "part_no", all = TRUE)
almost does the job but not quite, as the merged dataset only contains one row for each participant that is not associated with both original datasets. This makes my merged data look something like:
Part_no Column A Column B Column C
- - - - - - - - - - - - - - - - - - - - - - -
1 boy 17 Nick
1 boy 8 Nick
1 boy 45 Nick
1 boy 5 Nick
2 girl 14 June
2 girl 32 June
2 girl 21 June
2 girl 9 June
3 boy 11 John
4 boy 30 Alan
4 boy 27 Alan
4 boy 4 Alan
4 boy 15 Alan