0
b[,17] <- merge(a, b, by = "Date", all.y = TRUE)[ ,"RegID"]

There is a RegID column in both a and b... which produces an error.

"Error in `[.data.frame`(merge(a, b, by = "Date", all.y = TRUE),  : 
  undefined columns selected"

I could change the name in "a", but believe that there is probably an override which can fix this issue.

Can someone give a quick assist?

Sotos
  • 51,121
  • 6
  • 32
  • 66
dwdionis
  • 83
  • 7
  • Did you look at the data post-`merge` and before `[,"RegID"]`? That would likely have demonstrated that your names are now `RegID.x` and `RegID.y`. – r2evans Aug 29 '19 at 14:28
  • (BTW: `b[,17] <- merge(...)` seems really odd to me ... replacing a column (or matrix plane) with a frame.) – r2evans Aug 29 '19 at 14:29

1 Answers1

1

One option is to remove the 'RegID' in one of the datasets with (setdiff) and do the merge (assuming thee 'RegID' are the same in both datasets)

merge(a, b[setdiff(names(b), "RegID")], by = "Date", all.y = TRUE)[ ,"RegID"]

Programmatically, if there are multiple intersecting columns, find it with intersect and remove the 'Date' from it

nm1 <- setdiff(intersect(names(a), names(b)), "Date")
merge(a, b[setdiff(names(b), nm1)], by "Date",
      all.y = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I have no idea how that works/worked, but it worked. What exactly does setdiff do? Change the name temporarily? Can't wrap my head around it. – dwdionis Aug 29 '19 at 14:17
  • @dwdionis it remove that element. e.g. `v1 <- letters[1:4]; setdiff(v1, "a")` thereby having only a single column in one of the datasets with the same column name. if there are duplicate column names, by default `data.frame` and other tabular structures won't permit to have those remain as such. Insteead it will add some suffix to add to avoid confusion. You can also check `?make.unique` or `make.names` – akrun Aug 29 '19 at 14:18
  • 1
    Unless you are certain that `all(a$RegID == b$RegID)` (for rows in common), then this is silently discarding data. Without that, this answer is (imo) not what you should be doing. (If you're fine with dropping the data, then this works fine :-) – r2evans Aug 29 '19 at 14:30
  • @ r2evans I am actually just trying to move data from a to b while matching values by Date, like an excel vlookup. The dates in both columns will differ slightly. I just want what is in a to move to b, b will technically me empty at that point, but I want to keep names the same if possible. – dwdionis Aug 29 '19 at 14:49
  • Gotcha, sounds like this will work fine. One thought: reduce `b` to only include the merge-key and columns you need instead of dealing with it post-merge. For example, `merge(a, b[,c("Date","colA","colB")], ...)` is very declarative and will reduce the chance of issues. – r2evans Aug 29 '19 at 15:04
  • akrun, I find it odd that `merge` itself has no problem with producing a frame with duplicate column names, see `merge(..., suffixes=c("",""))` (though it does warn about duplicate names). – r2evans Aug 29 '19 at 15:38
  • 1
    @r2evans yes it has `suffixes` part. In general, I wouldn't opt for having duplicate column names in my data – akrun Aug 29 '19 at 16:07
  • @r2evans merge(a, b[,c("Date","colA","colB")], ...) This still produces the same error as my original code. – dwdionis Aug 29 '19 at 16:43
  • If you literally used my example (with names `"colA"` and `"colB"`), then I apologize for not being clear: use your actual real column names. If you did replace those with your actual (relevant) column names and see the error `undefined columns selected`, then something else is going on ... you should really include sample data then (https://stackoverflow.com/questions/5963269). – r2evans Aug 29 '19 at 16:48
  • @r2evans Yes, I used my own column names, of course. – dwdionis Aug 29 '19 at 17:19
  • Okay, just checking :-). I don't think we can help any more than this without a reproducible question. Can you *please* provide sample data? It might be sufficient to paste the output from `dput(head(a))` and `dput(head(b))` (limiting the columns if they're wide, making sure at least the key and 2-3 relevant columns are in each frame). – r2evans Aug 29 '19 at 17:21
  • (And you'd be surprised, sometimes literal sample code is pasted and attempted without personalizing it. A mistake perhaps made by hasty-coders or much less frequently mis-understanding "noob" R users.) – r2evans Aug 29 '19 at 17:23