0

I think this is a relatively easy and straightforward question. Or at least I hope so.

I am trying to merge some data together, Dataset A and Dataset B. I am joining the data together by a column that is in both datasets. This called is a list of participant numbers and is called part_no. However while my line of code below works in terms of merging Datasets A and B together, I end up losing some of my rows and hence some of my participants.

I am relatively confident (albeit not 100% sure) this may be because while the part_no column is common to both datasets, not all participants are included in each (i.e. some participants are in Dataset A, some are in Dataset B, and some are in both). This being the case, I still want the merge to go ahead with R filling in any missing values with NA or something similar.

My line of code is

merged_data = merge(DatasetA, DatasetB, by = "part_no")

Can anyone suggest a quick fix?

Thanks.

Edit: Each participant has multiple rows. The line merged_data = merge(DatasetA, DatasetB, by = "part_no", all = TRUE) almost does the job but not quite, as the merged dataset only contains one row for each participant that is not associated with both original datasets. This makes my merged data look something like:

Part_no     Column A    Column B    Column C
- - - - - - - - - - - - - - - - - - - - - - - 
1             boy        17          Nick
1             boy        8           Nick
1             boy        45          Nick
1             boy        5           Nick
2             girl       14          June
2             girl       32          June
2             girl       21          June
2             girl       9           June
3             boy        11          John
4             boy        30          Alan
4             boy        27          Alan
4             boy        4           Alan
4             boy        15          Alan
Rstudent
  • 75
  • 9
  • 1
    @27ϕ9, you left out a comma before `all = TRUE` argument. You can also post it as answer. – LC-datascientist Nov 18 '20 at 21:39
  • You have a better chance of getting a helpful response if you provide sample input data and the expected output. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – Ritchie Sacramento Nov 18 '20 at 22:20

0 Answers0