Remove columns with same value from a dataframe

Question

I've got a data frame like this one

1    1    1    K    1    K    K
2    1    2    K    1    K    K
3    8    3    K    1    K    K
4    8    2    K    1    K    K
1    1    1    K    1    K    K
2    1    2    K    1    K    K

I want to remove all the columns with the same value, i.e K, so my result will be like this

1    1    1    1    
2    1    2    1   
3    8    3    1  
4    8    2    1  
1    1    1    1 
2    1    2    1

I try to iterate in a for by columns but I didn't get anything. Any ideas?

Should the solution account for numeric as well as characters/factors? — Roman Luštrik, Dec 05 '11 at 16:30

Ben Bolker · Accepted Answer · 2011-12-05T17:00:41.050

4

To select columns with more than one value regardless of type:

uniquelength <- sapply(d,function(x) length(unique(x)))
d <- subset(d, select=uniquelength>1)

?

(Oops, Roman's question is right -- this could knock out your column 5 as well)

Maybe (edit: thanks to comments!)

isfac <- sapply(d,inherits,"factor")
d <- subset(d,select=!isfac | uniquelength>1)

or

d <- d[,!isfac | uniquelength>1]

edited Dec 05 '11 at 17:00

answered Dec 05 '11 at 16:33

Ben Bolker

211,554
25
370
453

Your subsetting doesn't work for me. Maybe `d[, !isfac | uniquelength != 1]`? – Roman Luštrik Dec 05 '11 at 16:37
... I "remember" (`?subset`) now, `subset` works on _rows_. To circumvent this, one should specify `select` explicitly, so `subset(d, select = !isfac | uniquelength > 1)`. @user976991, try that. – Roman Luštrik Dec 05 '11 at 16:49

Josh O'Brien · Answer 2 · 2011-12-05T18:55:49.417

Here's a solution that'll work to remove any replicated columns (including, e.g., pairs of replicated character, numeric, or factor columns). That's how I read the OP's question, and even if it's a misreading, it seems like an interesting question as well.

df <- read.table(text=" 
1    1    1    K    1    K    K
2    1    2    K    1    K    K
3    8    3    K    1    K    K
4    8    2    K    1    K    K
1    1    1    K    1    K    K
2    1    2    K    1    K    K")

# Need to run duplicated() in 'both directions', since  it considers
# the first example to be **not** a duplicate.
repdCols <- as.logical(duplicated(as.list(df), fromLast=FALSE) + 
                       duplicated(as.list(df), fromLast=TRUE))
# [1] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE

df[!repdCols]
#   V1 V2 V3 V5
# 1  1  1  1  1
# 2  2  1  2  1
# 3  3  8  3  1
# 4  4  8  2  1
# 5  1  1  1  1
# 6  2  1  2  1

score 2 · Answer 3 · answered Dec 05 '11 at 18:18

2

Another way to do this is using the higher order function Filter. Here is the code

to_keep <- function(x) any(is.numeric(x), length(unique(x)) > 1)
Filter(to_keep, d)

answered Dec 05 '11 at 18:18

Ramnath

54,439
16
125
152

score 2 · Answer 4 · edited Feb 18 '16 at 18:30

2

Oneliner solution.

df2 <- df[sapply(df, function(x) !is.factor(x) | length(unique(x))>1 )]

edited Feb 18 '16 at 18:30

Konrad

17,740
16
106
167

answered Dec 05 '11 at 21:48

Wojciech Sobala

7,431
2
21
27

Remove columns with same value from a dataframe

4 Answers4

Linked

Related