How do I remove duplicated columns from a data frame in R?

Question

I have a data.frame containing many duplicated columns, for example:

df = data.frame(a=1:10, b=1:10, c=2:11)

Is there a function (base R or dplyr) that removes duplicated columns ? unique() removes duplicate rows.

Unlike How to remove duplicated column names in R? my columns already have different names, but the values are identical.

It is actually not the duplicated answer mentioned above as here there is nothing about column names but duplicated column content which could have different names. — Vitali Avagyan, Oct 20 '19 at 17:28
Does this answer your question? [Identifying duplicate columns in a dataframe](https://stackoverflow.com/questions/9818125/identifying-duplicate-columns-in-a-dataframe) — Emmanuel-Lin, Dec 09 '21 at 12:21
The answer here is better than the ones in that question @Emmanuel-Lin — Neal Fultz, Dec 09 '21 at 15:38

score 5 · Accepted Answer · answered Oct 20 '19 at 16:45

5

An option is

df[!duplicated(as.list(df))]

Or

df[!duplicated(unclass(df))]

answered Oct 20 '19 at 16:45

akrun

Shouldn't this be: `df <- as.data.frame(df); df[, !duplicated(as.list(df))]` (as `df `in the example is a matrix) – JBGruber Oct 20 '19 at 16:53
@JBGruber. If `df` is a `matrix` then you may need `!duplicated(asplit(df, 2))` – akrun Oct 20 '19 at 16:54
1

@JBGruber. Here, it is a `data.frame` because `sleep` is a `data.frame` and `cbind` will dispatch the `cbind.data.frame` based on the structure, but if the OP used `cbind(a = sleep[,1], b = sleep[,1], c = 1:20)`, then it would be a `matrix` as it is dispatching different method – akrun Oct 20 '19 at 16:55
1

Ah! `sleep` is a built in dataset! I did not know that. – JBGruber Oct 20 '19 at 16:56

1 Answers1