2

I would like to keep duplicate columns, and delete columns that are unique. The columns would have same values, but different names.

x1 = rnorm(1:10)
x2 = rnorm(1:10)
x3 = x1
x4 = rnorm(1:10)
x5 = x2
x6 = rnorm(1:10)
x7 = rnorm(1:10)
df = data.frame(x1,x2,x3,x4,x5,x6,x7)

From here I would keep columns x1, x2, x3, and x5.

There is also a similar question for python: Get rows that have the same value across its columns in pandas

Community
  • 1
  • 1
user_n
  • 53
  • 7

1 Answers1

5

Use duplicated on a transposed version of your data, since the function by default checks for duplication of rows, not columns.

df[duplicated(t(df)) | duplicated(t(df), fromLast=TRUE)]

#            x1         x2          x3         x5
#1   1.82633666  1.2271611  1.82633666  1.2271611
#2  -1.33187496  0.9654359 -1.33187496  0.9654359
#...

As @Frank notes, you could also have df be treated like a list of vectors -

df[duplicated(c(df)) | duplicated(c(df), fromLast=TRUE)]

Or you could explicitly call the array method, specifying columns to be checked for duplicates:

df[duplicated.array(df, MARGIN=2) | duplicated.array(df, MARGIN=2, fromLast=TRUE)]
thelatemail
  • 91,185
  • 12
  • 128
  • 188