6

I have two sets of dataframes. Below are the first five lines for each.

First Data frame Name: sampel_sort
name                             id         supplier   usage
ABC                             10000079    811121     1
DEF                             10000182    541513     4
Supplier C                      10000484    531110     1
Supplier D                      10000523    541320     1
Supplier E                      10000592    524210     1
Supplier F                      10012711    237110     1

Second data frame Name: MBE
  id    State   total   CATEGORY
10000070    MD       5       MBE
10000182    PR       14      MBE
10000484    TX       1       MBE
10000526    MI       3       MBE
10000592    FL       1       MBE
10000680    ID       14      MBE

My actual dataset has lots more columns. I want to combine the two dataframes, but would like to import only the category column. the following merge statement works:

ncombined <- merge(x = sample_sort, y = MBE, by = "id", all.x = TRUE)

But this gives me all the columns from the MBE dataset. I tried the following in different ways (so that only the category column gets imported). But I am not having any luck. I get an error

ncombined <- merge(x = sample_sort, y = MBE[,c("CATEGORY")], by = "id", all.x = TRUE)

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

The final result should be as follows:

First Data frame Name: sample_sort
name                             id         supplier   usage  CATEGORY
ABC                             10000079    811121     1       MBE
DEF                             10000182    541513     4       MBE
Supplier C                      10000484    531110     1       MBE
Supplier D                      10000523    541320     1       MBE
Supplier E                      10000592    524210     1       MBE
Supplier F                      10012711    237110     1       NA
Jaap
  • 81,064
  • 34
  • 182
  • 193
jalsa
  • 101
  • 2
  • 2
  • 7
  • First, could you please put a little more effort into making your question readable. I have no clue as to what is supposed to be code and what is commentary. Secondly, maybe `cbind()` could work? Again it's really hard to read and understand your question. – Chase Grimm Jun 27 '16 at 19:37
  • 1
    How do you expect to merge by `id` when you're only selecting `CATEGORY`? – ytk Jun 27 '16 at 19:40
  • 1
    if you subset `MBE` to only contain the `CATEGORY` column, there is no longer any `id` column to merge on – moman822 Jun 27 '16 at 19:40
  • 8
    `ncombined <- merge(x = sample_sort, y = MBE[,c("id","CATEGORY")], by = "id", all.x = TRUE)` or `merge(x = sample_sort, y = MBE[,c(1,4)], by = "id", all.x = TRUE)` – Chase Grimm Jun 27 '16 at 19:41
  • thanks! that worked....i was missing the "id' for subset.... – jalsa Jun 27 '16 at 20:05

1 Answers1

2

Try taking out the columns before merging, eg

ncombined <- merge(x = sample_sort, y = MBE[,c(1:4)], by = "id", all.x = TRUE)
David
  • 11,245
  • 3
  • 41
  • 46