0

Data set1:

ID Name     Territory   Sales
1  Richard  NY            59
8  Sam      California    44

Data set2:

Terr ID  Name   Comments
 LA   5   Rick    yes
 MH   11  Oly     no

I want final data set to have columns of 1st data set only and identify Territory is same as Terr and does not bring forward Comments column.

Final data should look like:

ID Name     Territory  Sales
1  Richard  NY           59
8  Sam      California   44
5  Rick     LA           NA
11 Oly      MH           NA

Thanks in advance

Jaap
  • 81,064
  • 34
  • 182
  • 193
sg94
  • 11
  • 1
  • 3
    `rbind(set1, setNames(set2[,-4], names(set1)))` – Jaap May 08 '18 at 10:59
  • or `library(data.table); rbindlist(list(set1, set2[,-4]))` – Jaap May 08 '18 at 11:04
  • My columns are also in not a same order. I have just edited the question, please answer now. Thanks ! – sg94 May 08 '18 at 11:06
  • `rbind(set1, setNames(set2[,c(2,3,1)], names(set1)))` – Jaap May 08 '18 at 11:07
  • Please check the question again.. i just changed it. Since my columns are not in same order – sg94 May 08 '18 at 11:07
  • I want to identify it on column names, like if Territory = Terr, append it... something like that.. this is because my data set might not have same order every time i receive the input files.. So want to do it on column names – sg94 May 08 '18 at 11:09
  • `rbind(set1, setNames(set2[, names(sort(unlist(sapply(names(set2), agrep, x = names(set1)))))], names(set1)))`? – Jaap May 08 '18 at 11:16
  • Getting an error : undefined columns selected. Sorry i am new to this.. In case my questions are silly – sg94 May 08 '18 at 11:25
  • Seems to be more complicated than at first sight. I've reopened the question & posted a possible solution – Jaap May 09 '18 at 08:36

1 Answers1

0

A possible solution:

# create a named vector with names from 'set2' 
# with the positions of the matching columns in 'set1'
nms2 <- sort(unlist(sapply(names(set2), agrep, x = names(set1))))

# only keep the columns in 'set2' for which a match is found
# and give them the same names as in 'set1'
set2 <- setNames(set2[names(nms2)], names(set1[nms2]))

# bind the two dataset together

# option 1:
library(dplyr)
bind_rows(set1, set2)

# option 2:
library(data.table)
rbindlist(list(set1, set2), fill = TRUE)

which gives (dplyr-output shown):

  ID    Name  Territory Sales
1  1 Richard         NY    59
2  8     Sam California    44
3  5    Rick         LA    NA
4 11     Oly         MH    NA

Used data:

set1 <- structure(list(ID = c(1L, 8L), 
                       Name = c("Richard", "Sam"),
                       Territory = c("NY", "California"),
                       Sales = c(59L, 44L)),
                  .Names = c("ID", "Name", "Territory", "Sales"), class = "data.frame", row.names = c(NA, -2L))
set2 <- structure(list(Terr = c("LA", "MH"),
                       ID = c(5L, 11L),
                       Name = c("Rick", "Oly"),
                       Comments = c("yes", "no")),
                  .Names = c("Terr", "ID", "Name", "Comments"), class = "data.frame", row.names = c(NA, -2L))
Jaap
  • 81,064
  • 34
  • 182
  • 193