2

Apologies if the title is not an accurate description of what I'm doing.

I am trying to construct every possible hypothetical team for a fantasy sports competition. This means combining all available players, each of whom has characteristics like the team they are on, their position, and their salary, which limits how many can be on a single team. The trouble I am having is finding a memory efficient way to combine them all.

I made an example dataset:

 player_pool <- data.frame(id = seq(1,30), salary = seq(1,30), team = rep(LETTERS[seq(from=1, to=5)],6), position = rep(LETTERS[seq(from=1, to=5)],6))

Out of these 30 players I would like to choose every team of 8, with at least 1 player from all 5 roles, no more than 3 players from the same team, and a combined salary of less than 50.

For example, this would be a valid team:

 id salary team position
 1   1      A   A
 2   2      B   B
 3   3      C   C
 4   4      D   D
 5   5      E   E
 6   6      A   A
 7   7      B   B
 8   8      C   C

No more than two players from each team, at least 1 of each position, and at 36 total salary, under the cap.

I have been trying to implement a formula which goes through all ~6MM combinations step by step using the package iterpc, looking up and calculating salary/team numbers at each step. This lets me fit everything into memory at each step, but is incredibly slow and inefficient -- it amounts to creating every possible team and applying the rules in succession.

Any alternate approaches would be great!

verybadatthis
  • 1,448
  • 2
  • 14
  • 32
  • 8 GB RAM, but I could try and run it on another computer with 16 GB, or on a server with 60 GB -- though the server is shared so it is generally less than that amount. – verybadatthis Jul 22 '15 at 19:58

1 Answers1

5

Setup Adding up the seven lowest-paid players, you get 28. This means that no one with a salary above 22 can be on the team.

pool <- subset(player_pool,salary<=22)

Finding combos From here, I would take the obvious route instead of looking for efficiency:

  1. Identify all combos of rows

    rs <- combn(seq(nrow(pool)),8)
    
  2. Test conditions

    good_rs <- with(pool,apply(rs,2,function(x){
      sum(salary[x]) <= 50 &&
      length(unique(position[x])) == 5 &&
      max(lengths(split(x,team[x]))) <= 3
    }))
    

Results It runs fast enough (under a second), and I see 339 matching combos

length(which(good_rs))
# [1] 339
Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    `combnPrim` would be slightly [fast](http://stackoverflow.com/questions/26828301/faster-version-of-combn). – akrun Jul 22 '15 at 20:15
  • It is from the gRbase – akrun Jul 22 '15 at 20:16
  • 1
    @akrun Oh ok. Can't install that on 3.2.0 (since its dependency RBGL is not available). Yeah, could be faster as it has special treatment for `seq(n)` cases – Frank Jul 22 '15 at 20:20
  • This looks great, but I'm having a bit of trouble replicating it. Is lengths a typo? I changed it to length, but when I run it I end up with only values of false for `good_rs`. `pool <- subset(player_pool,salary<=22) rs <- combn(seq(nrow(pool)),8) good_rs <- with(pool,apply(rs,2,function(x){ sum(salary[x]) <= 50 && length(unique(position[x])) == 5 && max(length(split(x,team[x]))) <= 3 })) table(good_rs) good_rs FALSE 319770 ` Sorry for th eugly formatting -- no line breaks in comments – verybadatthis Jul 22 '15 at 20:21
  • 3
    @verybadatthis You'll need R 3.2.0+ for `lengths(x)`. It's a (faster-running) shortcut to `sapply(x,length)` – Frank Jul 22 '15 at 20:22
  • 3
    If you don't have lengths (like me) you can replace condition "max(lengths(split(x,team[x])))<=3" with "max(table(team[x]))<=3" – Bridgeburners Jul 22 '15 at 20:32
  • 1
    Thank you, that is perfect! Impressed by my fellow Stack Overflow members as always – verybadatthis Jul 22 '15 at 20:40