How do I merge two datasets using R without getting duplicate values?

Question

I'm trying to merge two datasets in R. The 1st dataset is called AcademicData and the other one is called Mathsdata. When I merge the datasets, I'm getting thousands of duplicate rows. Here a pic of the code and the resulting merge table called total. I'm trying to merge the datasets by the variable "gender".

Heres the code.

setwd("H:/Data application/x14484252-DAD Project")

MathsData <- read.csv("Math-Students.csv", header=T, na.strings=c(""), 
  stringsAsFactors = T)

AcademicData <- read.csv("Academic-Performance.csv", header=T, 
  na.strings=c(""), stringsAsFactors = T)

total <- merge(MathsData, AcademicData, by="gender", all.x=TRUE)

As you can see from the image, there are 93,435 rows being created from the merge in the table called total.Table

Heres an image of the each the 1st dataset in excel. Academic Dataset Here an image of the second dataset in excel. MathsData

I want to merge the two datasets by gender, without duplicate rows being created in the table called total.

Please don't post pictures of code, see [creating a great reproducible example in R](https://stackoverflow.com/a/5963610/4421870) — Mako212, Dec 18 '17 at 23:50
I think you need to be more specific about what you want the output to look like. Gender is not a unique ID variable in either dataset, so the merge you are giving is basically saying: for every row in MathsData, give a corresponding row for every matching row in AcademicData. If there are 100 girls and 200 boys in AcademicData, your merge will have 100 rows per girl and 200 rows per boy in MathsData. For more information, [R for Data Science](http://r4ds.had.co.nz/relational-data.html#mutating-joins) has some good images of what different joins look like. — Calum You, Dec 19 '17 at 00:20

score 1 · Answer 1 · answered Dec 19 '17 at 00:33

1

You could do this:

library(data.table)
setDT(MathsData); setDT(AcademicData)
MathsData[AcademicData, mult = "first", on = "gender", nomatch=0L]

Since you did not provide a reproducible data, I couldn't test the code. But I think this shall work well.

answered Dec 19 '17 at 00:33

Santosh M.

2,356
1
17
29

How do I merge two datasets using R without getting duplicate values?

1 Answers1