2

I have a data.frame containing many duplicated columns, for example:

df = data.frame(a=1:10, b=1:10, c=2:11)

Is there a function (base R or dplyr) that removes duplicated columns ? unique() removes duplicate rows.

Unlike How to remove duplicated column names in R? my columns already have different names, but the values are identical.

Neal Fultz
  • 9,282
  • 1
  • 39
  • 60
  • It is actually not the duplicated answer mentioned above as here there is nothing about column names but duplicated column content which could have different names. – Vitali Avagyan Oct 20 '19 at 17:28
  • Does this answer your question? [Identifying duplicate columns in a dataframe](https://stackoverflow.com/questions/9818125/identifying-duplicate-columns-in-a-dataframe) – Emmanuel-Lin Dec 09 '21 at 12:21
  • The answer here is better than the ones in that question @Emmanuel-Lin – Neal Fultz Dec 09 '21 at 15:38

1 Answers1

5

An option is

df[!duplicated(as.list(df))]

Or

df[!duplicated(unclass(df))]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Shouldn't this be: `df <- as.data.frame(df); df[, !duplicated(as.list(df))]` (as `df `in the example is a matrix) – JBGruber Oct 20 '19 at 16:53
  • @JBGruber. If `df` is a `matrix` then you may need `!duplicated(asplit(df, 2))` – akrun Oct 20 '19 at 16:54
  • 1
    @JBGruber. Here, it is a `data.frame` because `sleep` is a `data.frame` and `cbind` will dispatch the `cbind.data.frame` based on the structure, but if the OP used `cbind(a = sleep[,1], b = sleep[,1], c = 1:20)`, then it would be a `matrix` as it is dispatching different method – akrun Oct 20 '19 at 16:55
  • 1
    Ah! `sleep` is a built in dataset! I did not know that. – JBGruber Oct 20 '19 at 16:56