I'm trying to merge two datasets in R. The 1st dataset is called AcademicData and the other one is called Mathsdata. When I merge the datasets, I'm getting thousands of duplicate rows. Here a pic of the code and the resulting merge table called total. I'm trying to merge the datasets by the variable "gender".
Heres the code.
setwd("H:/Data application/x14484252-DAD Project")
MathsData <- read.csv("Math-Students.csv", header=T, na.strings=c(""),
stringsAsFactors = T)
AcademicData <- read.csv("Academic-Performance.csv", header=T,
na.strings=c(""), stringsAsFactors = T)
total <- merge(MathsData, AcademicData, by="gender", all.x=TRUE)
As you can see from the image, there are 93,435 rows being created from the merge in the table called total.Table
Heres an image of the each the 1st dataset in excel. Academic Dataset Here an image of the second dataset in excel. MathsData
I want to merge the two datasets by gender, without duplicate rows being created in the table called total.