1

Say I have the following dataframe

x <- c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10")
y <- c(1,4,3,5,5,7,2,2,6,8)
df <- data.frame(x,y)

say x represents players and y represents goals. I want all the subsets of players whose sum of goals is 10, say

{p1,p3,p9},{p3,p6},{p7,p8,p9}...
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
areddy
  • 373
  • 3
  • 7
  • 18
  • 1
    Not sure if this helps, but it seems related: http://stackoverflow.com/q/31572497/1191259 You probably want to impose a team size to keep things feasible (fine for only 10 players, but if you had another 20 players with 0 or 1 goals, it'll get ugly). ... And another: http://stackoverflow.com/q/32855755/1191259 – Frank Oct 25 '15 at 12:48
  • Incidentally, your `cbind` call is completely redundant. Leave it out. – Konrad Rudolph Oct 25 '15 at 14:25

3 Answers3

5

1) lpSolve This can be done using integert linear programming. We use an objective of c(0, ..., 0) and a one row matrix consisting of y as the constraint matrix. The right hand side of the constraint must equal n, i.e. 10.

library(lpSolve)
y <- c(1,4,3,5,5,7,2,2,6,8)
n <- length(y)
k <- sum(cumsum(sort(y)) <= n) + 1 # upper bound to no of players in group
out <- lp(objective = numeric(n), 
   const.mat = matrix(y, 1), const.dir = "==", const.rhs = n,
   all.bin = TRUE, num.bin.solns = sum(choose(n, 1:k)))

# solution vector seems to have junk at end so truncate it and reshape to matrix
soln <- matrix(head(out$solution, n * out$num.bin.solns), n)

It found a total of 19 solutions:

> out
Success: the objective function is 0 
         19 solutions returned

> out$num.bin.solns
[1] 19

> dim(soln)
[1] 10 19

The columns of soln are the feasible solutions. For example, the first solun is players 1, 2 and 4:

> soln[, 1]
 [1] 1 1 0 1 0 0 0 0 0 0
> which(soln[, 1]==1)
[1] 1 2 4

We can list the solutions as strings like this:

> x <- c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10")
> apply(soln == 1, 2, function(v) toString(x[v]))
 [1] "p1, p2, p4"     "p4, p5"         "p3, p4, p7"     "p1, p4, p7, p8"
 [5] "p1, p2, p3, p8" "p1, p2, p3, p7" "p1, p3, p9"     "p3, p4, p8"    
 [9] "p1, p2, p5"     "p3, p5, p7"     "p2, p9"         "p3, p5, p8"    
[13] "p3, p6"         "p1, p5, p7, p8" "p1, p6, p7"     "p1, p6, p8"    
[17] "p7, p8, p9"     "p8, p10"        "p7, p10"  

2) wle A second approach is to create all 10^2 subsets of 1:10 as binary vectors v and then select out those for which y %*% v == 10 (where y is from the question). This approach results in concise code and would be OK as long as y is not too long.

library(wle)
m <- sapply(0:(2^10-1), function(x) binary(x, 10)$binary)
soln2 <- m[, y %*% m == 10]

Use the same method as in (1) to turn that into a vector of strings if that form is preferred.

Update: Some corrections and improvements and added (2).

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you so much for the answer... can i set minimum number of players in team? say minimun 3 players – areddy Oct 25 '15 at 14:48
  • To constrain the number of players in the solution to k=3: k <- 3; out <- lp(objective = numeric(n), const.mat = rbind(y, 1), const.dir = c("==", "<="), const.rhs = c(n, k), all.bin = TRUE, num.bin.solns = choose(n,k)) Alternately use the soln in the answer and just select out those columns with no more than k ones, i.e. k <- 3; soln <- soln[, colSums(soln) <= k] where soln is as calcualted in the answer. – G. Grothendieck Oct 25 '15 at 15:21
2

You can use this bruteforce approach: Result would be:

 [1] "p2,p9"       "p3,p6"       "p4,p5"       "p7,p10"      "p8,p10"      "p1,p2,p4"    "p1,p2,p5"   
 [8] "p1,p3,p9"    "p1,p6,p7"    "p1,p6,p8"    "p3,p4,p7"    "p3,p4,p8"    "p3,p5,p7"    "p3,p5,p8"   
[15] "p7,p8,p9"    "p1,p2,p3,p7" "p1,p2,p3,p8" "p1,p4,p7,p8" "p1,p5,p7,p8"

Data:

x<-c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10")
y<-c(1,4,3,5,5,7,2,2,6,8)
df<-data.frame(x=x,y=y, stringsAsFactors = FALSE)
df$id <- seq_len(nrow(df)) # Adding an ID column

Get all possible combinations of upto max_comb elements

max_comb <- nrow(df)
my_combn <- function(m, x){
  combn(x, m, simplify = FALSE)
}
dat <- lapply(1:max_comb, my_combn, df$id)

Set names to the picked combinations

combn_names <- function(ind, vec, collapse = ", "){
  paste(vec[ind], collapse = collapse)
}

set_list_combn_names <- function(l, vec){
  setNames(l, lapply(l, combn_names, vec = vec))
}
dat <- lapply(dat, set_list_combn_names, df$x)

Check if sum is equal to x=10 and output names of combinations

sum_equal_x <- function(ind, vec, x){
  sum(vec[ind]) == x
}
names(which(unlist(lapply(dat, lapply, sum_equal_x, df$y, 10))))

Result:

> names(which(unlist(lapply(dat, lapply, sum_equal_x, df$y, 10))))
 [1] "p2,p9"       "p3,p6"       "p4,p5"       "p7,p10"      "p8,p10"      "p1,p2,p4"    "p1,p2,p5"   
 [8] "p1,p3,p9"    "p1,p6,p7"    "p1,p6,p8"    "p3,p4,p7"    "p3,p4,p8"    "p3,p5,p7"    "p3,p5,p8"   
[15] "p7,p8,p9"    "p1,p2,p3,p7" "p1,p2,p3,p8" "p1,p4,p7,p8" "p1,p5,p7,p8"
Rentrop
  • 20,979
  • 10
  • 72
  • 100
1

You could do something like this:

require(utils)

x<-c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10")
y<-c(1,4,3,5,5,7,2,2,6,8)
df<-data.frame(cbind(x=x,y=y))

search.val <- 10
max.num    <- length(x)

all.comb <- lapply(1:max.num, function(n){ combn(x,n) })

# Calcualte sum
#   Foreach combination length 1:n
sums <- lapply(all.comb, function(comb.mat){
  # Foreach combination of length n
  apply(comb.mat,2,function(col){
    sum(as.numeric( df[which(df$x %in% col),]$y ))
  })
})

# Find which combinations have sum 10
vals <- lapply(1:max.num,function(i){
  sum.vect <- sums[[i]]
  inds     <- which(sum.vect == search.val)
  lapply(inds, function(j){
    all.comb[[i]][,j]
  })
})

sum.of.10 <- unlist(vals,recursive=FALSE)

This is a brute force solution where all sums of all combinations upp to length max.num is calculated using the combn function.

RmIu
  • 4,357
  • 1
  • 21
  • 24