Suppose I have the following dataframe :
df <- data.frame(A=c(1,2,3),B=c("a","b","c"),C=c(2,1,3),D=c(1,2,3),E=c("a","b","c"),F=c(1,2,3))
> df
A B C D E F
1 1 a 2 1 a 1
2 2 b 1 2 b 2
3 3 c 3 3 c 3
I want to filter out the columns that are identical. I know that I can do it with
DuplCols <- df[duplicated(as.list(df))]
UniqueCols <- df[ ! duplicated(as.list(df))]
In the real world my dataframe has more than 500 columns and I do not know how many identical columns of the same kind I have and I do not know the names of the columns. However, each columnname is unique (as in df
). My desired result is (optimally) a dataframe where in each row the column names of the identical columns of one kind are stored. The number of columns in the DesiredResult
dataframe is the maximal number of identical columns of one kind in the original dataframe and if there are less identical columns of another kind, NA
should be stored:
> DesiredResult
X1 X2 X3
1 A D F
2 B E NA
3 C NA NA
(With "identical column of the same kind" I mean the following: in df
the columns A
, D
, F
are identical columns of the same kind and B
, E
are identical columns of the same kind.)