2
mergedata <- merge (dataset1, dataset2, by.x="personalid")

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

PKumar
  • 10,971
  • 6
  • 37
  • 52
charan kumar
  • 31
  • 1
  • 4

1 Answers1

5

The OP had specified only the by.x. If the column names are the same, then by can be used

merge(dataset1, dataset2, by="personalid") 

If the by variable names are different, then we need to specify the by.y as well

merge(dataset1, dataset2, by.x="personalid", by.y = "somethingelse") 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    its still showing a error – charan kumar Apr 21 '18 at 10:40
  • 1
    @charankumar if you are not intending to show a small reproducible, it is difficult for others to know what exactly the problem is. – akrun Apr 21 '18 at 10:41
  • dataset1 <- read.csv("Census Income Data Set1.csv") dataset2 <- read.csv("Census Income Data Set2.csv") str(dataset1) str(dataset2) colnames(dataset1)[1] <- "personalid" colnames(dataset2)[1] <- "perosnalid"merge(dataset1, dataset2, by.x="personalid", by.y = "somethingelse") – charan kumar Apr 21 '18 at 10:41
  • this is the full program – charan kumar Apr 21 '18 at 10:41
  • @charankumar Your code, shows both the datasets have the same 'personalid', then why not use the `by = 'personalid' – akrun Apr 21 '18 at 10:42
  • dataset1 <- read.csv("Census Income Data Set1.csv") dataset2 <- read.csv("Census Income Data Set2.csv") str(dataset1) str(dataset2) colnames(dataset1)[1] <- "personalid" colnames(dataset2)[1] <- "perosnalid" merge(dataset1, dataset2, by="personalid") – charan kumar Apr 21 '18 at 10:44
  • @charankumar what is the new error? What does the `str` shows? – akrun Apr 21 '18 at 10:45
  • str shows thats its a correct one but coming to merge cell it shows an error – charan kumar Apr 21 '18 at 10:49
  • @charankumar Can you update your post with `str` and also `dput` of a small dataset – akrun Apr 21 '18 at 10:50
  • str(dataset2) 'data.frame': 48842 obs. of 6 variables: $ perosnalid : Factor w/ 48842 levels "P1","P10","P100",..: 1 11112 22223 33334 43288 44399 45510 46621 47732 2 ... $ capital.gain : int 2174 0 0 0 0 0 0 0 14084 5178 ... $ capital.loss : int 0 0 0 0 0 0 0 0 0 0 ... $ hours.per.week: int 40 13 40 40 40 40 16 45 50 40 ... $ native.country: Factor w/ 42 levels " Cambodia"," Canada",..: 39 39 39 39 5 39 23 39 39 39 ... $ class : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ... – charan kumar Apr 21 '18 at 10:52
  • @charankumar There is an edit button where you can edit in your post – akrun Apr 21 '18 at 10:53
  • @charankumar ; can you add the results of `str` to your question (by clicking the *edit* button, bottom left of your question) as it is difficult to see what is going on when added as comments. Akrun's answer should work. – user20650 Apr 21 '18 at 11:16
  • 1
    huh, [from your comment](https://stackoverflow.com/questions/49954874/error-in-fix-byby-x-x-by-must-specify-a-uniquely-valid-columnmergedata#comment86926380_49954891) above the variable is `perosnalid` not `personalid` - make sure of your spelling – user20650 Apr 21 '18 at 11:18