10

I want to cbind two data frames and remove duplicated columns. For example:

df1 <- data.frame(var1=c('a','b','c'), var2=c(1,2,3))
df2 <- data.frame(var1=c('a','b','c'), var3=c(2,4,6))

cbind(df1,df2) #this creates a data frame in which column var1 is duplicated

I want to create a data frame with columns var1, var2 and var3, in which column var2 is not repeated.

Adam Smith
  • 2,584
  • 2
  • 20
  • 34
danilinares
  • 1,172
  • 1
  • 9
  • 28

3 Answers3

12

merge will do that work.

try:

merge(df1, df2)
kohske
  • 65,572
  • 8
  • 165
  • 155
  • 3
    There is no contradiction with the example in question, but should there be deviating values in var1, those cases would be deleted with merge; e.g. try `df2<-data.frame(var1=c('a','b','d'),var3=c(2,4,6))`. This is relevant for cases where variable names are duplicated, but the respective data is not. – Maxim.K Oct 17 '14 at 13:51
1

In case you inherit someone else's dataset and end up with duplicate columns somehow and want to deal with them, this is a nice way to do it:

for (name in unique(names(testframe))) {
  if (length(which(names(testframe)==name)) > 1) {
    ## Deal with duplicates here. In this example
    ## just print name and column #s of duplicates:
    print(name)
    print(which(names(testframe)==name))
  }
}
Sam
  • 358
  • 2
  • 11
1

The function mutate in dplyr can take two dataframes as arguments and all columns in the second dataframe will overwrite existing columns in the first dataframe. Columns that don't exist in the first dataframe will be constructed in the new dataframe.

> mutate(df1,df2)
   var1 var2 var3
 1    a    1    2
 2    b    2    4
 3    c    3    6
svendvn
  • 141
  • 4