I have a data frame with entries in R, and want to create all possible unique subsets from this data frame, when each subset should include a unique possible pairwise combination of two columns from the pool of columns in the original data frame. This means that if the number of columns in the original data frame is Y, the number of unique subsets I should get is Y*(Y-1)/2. I also want that the name of the columns in each subset would be the name that was used in the original data frame. How do I do it?
Asked
Active
Viewed 223 times
-1
-
Hi, welcome to SO. Since you are new here, you might want to read the [**about**](http://stackoverflow.com/about) and [**FAQ**](http://stackoverflow.com/faq) sections of the website to help you get the most out of it. Please also read [**how to make a great reproducible example**](http://stackoverflow.com/q/5963269/1478381) and update your question accordingly! Posted questions where the OP has not shown what they have already attempted and/or the desired output tend to get downvoted or closed. Just warning you for next time. – Simon O'Hanlon Sep 01 '13 at 13:46
-
What function is to be applied to each pair of columns to create another column in the new dataframe? – Ferdinand.kraft Sep 01 '13 at 14:20
2 Answers
0
colpairs <- function(d) {
apply(combn(ncol(d),2), 2, function(x) d[,x])
}
x <- colpairs(iris)
sapply(x, head, n=2)
## [[1]]
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
##
## [[2]]
## Sepal.Length Petal.Length
## 1 5.1 1.4
## 2 4.9 1.4
...

Matthew Lundberg
- 42,009
- 6
- 90
- 112
0
I'd use combn
to make the indices of your columns, and lapply
to take subsets of your data.frame and store them in a list
structure. e.g.
# Example data
set.seed(1)
df <- data.frame( a = sample(2,4,repl=T) ,
b = runif(4) ,
c = sample(letters ,4 ),
d = sample( LETTERS , 4 ) )
# Use combn to get indices
ind <- combn( x = 1:ncol(df) , m = 2 , simplify = FALSE )
# ind is the column indices. The indices returned by the example above are (pairs in columns):
#[,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 1 1 2 2 3
#[2,] 2 3 4 3 4 4
# Make subsets, combine in list
out <- lapply( ind , function(x) df[,x] )
[[1]]
# a b
#1 1 0.2016819
#2 1 0.8983897
#3 2 0.9446753
#4 2 0.6607978
[[2]]
# a c
#1 1 q
#2 1 b
#3 2 e
#4 2 x
[[3]]
# a d
#1 1 R
#2 1 J
#3 2 S
#4 2 L
[[4]]
# b c
#1 0.2016819 q
#2 0.8983897 b
#3 0.9446753 e
#4 0.6607978 x
[[5]]
# b d
#1 0.2016819 R
#2 0.8983897 J
#3 0.9446753 S
#4 0.6607978 L
[[6]]
# c d
#1 q R
#2 b J
#3 e S
#4 x L

Simon O'Hanlon
- 58,647
- 14
- 142
- 184
-
1You don't need `lapply`: `combn( x = 1:ncol(df) , m = 2 , FUN=function(x) df[,x], simplify = FALSE )` – Roland Sep 01 '13 at 15:25
-
@Roland thanks, I always forget you can supply a function to `combn`. – Simon O'Hanlon Sep 01 '13 at 19:21