0
df <- data.frame(a = c(1, 2, 3, 4, 5), b = c(2, 3, 5, 6, 3), c = c(1, 2, 3, 4, 5), d = c(2, 3, 4, 4, 4), e = c(2, 3, 5, 6, 3))  


  a b c d e
  1 2 1 2 2
  2 3 2 3 3
  3 5 3 4 5
  4 6 4 4 6
  5 3 5 4 3 

My question is rather simple, but I cannot get around it myself. Is there a simple way to remove all the duplicated columns except one (in each 'group', i.e. in this case we have groups of (a, c) and (b, e))?

My expected output:

  a b d
  1 2 2
  2 3 3
  3 5 4
  4 6 4
  5 3 4

Since due to a specific situation I cannot turn the dataframe into a matrix, this has to apply to a dataframe, possibly to a dataframe of larger volume.

arg0naut91
  • 14,574
  • 2
  • 17
  • 38

2 Answers2

2

We can transpose the data frame and then use the duplicated function to select the non-duplicated columns.

df[, !duplicated(t(df))]
#   a b d
# 1 1 2 2
# 2 2 3 3
# 3 3 5 4
# 4 4 6 4
# 5 5 3 4
www
  • 38,575
  • 12
  • 48
  • 84
  • Thanks a lot, it's a good hack - but quite slow for larger dataframes; apologies for not specifying that condition in the question. – arg0naut91 Aug 04 '18 at 15:22
  • Yes, it could be slow because converting from data frame to matrix takes extra time. – www Aug 04 '18 at 15:24
2

How about:

df[!duplicated(as.list(df))]

  a b d
1 1 2 2
2 2 3 3
3 3 5 4
4 4 6 4
5 5 3 4
sbha
  • 9,802
  • 2
  • 74
  • 62