2

I have a dataframe where each column has a unique name, but the content of several columns is identical. The columns with identical content are all factor variables and they end in the same way (e.g. .x or .y). My goal is to join all columns with the same ending (.x or .y) into a single column.

Most solutions I have encountered in this regard combine multiple dataframes, but I have not found a solution yet that does this within a single dataframe. I am providing some example script to illustrate what my dataframe looks like at the moment and the desired output.

# generate some data
dv1 = rnorm(6)
dv2 = rnorm(6)
dv3 = rnorm(6)

# current dataframe
DF <- data.frame(dv1, 
                 iv1.x = sort(rep(letters[1:2], 3)), 
                 iv1.y = as.factor(c(1:6)),
                 dv2, 
                 iv2.x = sort(rep(letters[1:2], 3)), 
                 iv2.y = as.factor(c(1:6)),
                 dv3, 
                 iv3.x = sort(rep(letters[1:2], 3)), 
                 iv3.y = as.factor(c(1:6))
                 )

# desired dataframe 
DF.cbmd <- data.frame(dv1,
                 dv2, 
                 dv3,
                 iv1.x = sort(rep(letters[1:2], 3)), 
                 iv1.y = as.factor(c(1:6))
                 )
Tiberius
  • 331
  • 1
  • 9

2 Answers2

4

If they are truly duplicate columns, it seems there's no use to merge them, but you can simply remove them:

dfUnique <- DF[!duplicated(as.list(DF))]
Sven
  • 1,203
  • 1
  • 5
  • 14
2

Your data frame seems to be a result of a merge. The ideal fix would be to handle this on the previous step (merging). However, another idea would be to remove everything before the . at the column names, and simply remove duplicate column names, i.e.

DF[!duplicated(gsub('.*\\.', '', names(DF)))]
Sotos
  • 51,121
  • 6
  • 32
  • 66