Remove identical columns while keeping one from each group

Question

df <- data.frame(a = c(1, 2, 3, 4, 5), b = c(2, 3, 5, 6, 3), c = c(1, 2, 3, 4, 5), d = c(2, 3, 4, 4, 4), e = c(2, 3, 5, 6, 3))  


  a b c d e
  1 2 1 2 2
  2 3 2 3 3
  3 5 3 4 5
  4 6 4 4 6
  5 3 5 4 3

My question is rather simple, but I cannot get around it myself. Is there a simple way to remove all the duplicated columns except one (in each 'group', i.e. in this case we have groups of (a, c) and (b, e))?

My expected output:

Since due to a specific situation I cannot turn the dataframe into a matrix, this has to apply to a dataframe, possibly to a dataframe of larger volume.

Slightly, for me it is a dataframe, and the solutions there are too slow. — arg0naut91, Aug 04 '18 at 15:20

score 2 · Answer 1 · answered Aug 04 '18 at 15:05

2

We can transpose the data frame and then use the duplicated function to select the non-duplicated columns.

df[, !duplicated(t(df))]
#   a b d
# 1 1 2 2
# 2 2 3 3
# 3 3 5 4
# 4 4 6 4
# 5 5 3 4

answered Aug 04 '18 at 15:05

www

38,575
12
48
84

Thanks a lot, it's a good hack - but quite slow for larger dataframes; apologies for not specifying that condition in the question. – arg0naut91 Aug 04 '18 at 15:22
Yes, it could be slow because converting from data frame to matrix takes extra time. – www Aug 04 '18 at 15:24

score 2 · Accepted Answer · answered Aug 04 '18 at 15:06

2

How about:

df[!duplicated(as.list(df))]

  a b d
1 1 2 2
2 2 3 3
3 3 5 4
4 4 6 4
5 5 3 4

answered Aug 04 '18 at 15:06

sbha

9,802
2
74
62

Very fast for dataframes, even with ~ 1000 columns & million rows. – arg0naut91 Aug 04 '18 at 15:21

Remove identical columns while keeping one from each group

2 Answers2