0

I am having a difficult time trying to get everything right with nested loops. I feel like I am initializing variables that don't have to be. This code does end up giving me the desired output, but what can be done to improve this? The rep(vec[i], length(vec)) seems janky to me.

Also I can imagine different methods to solve this permutation problem, using replicate or lists, but I am struggling more with how to properly set up nested loops and need help in that context.

vec <- c("red", "blue", "green", "orange")

col2 <- vector()
final <- data.frame(col1 =NULL, col2 = NULL)

for (i in 1:length(vec)){
  for (j in 1:length(vec)){
    col1 <- rep(vec[i], length(vec))
    col2[j] <- vec[j]
    temp.df <- cbind(col1, col2)
  }
  final <- rbind(final, temp.df)
}

final

     col1   col2
1     red    red
2     red   blue
3     red  green
4     red orange
5    blue    red
6    blue   blue
7    blue  green
8    blue orange
9   green    red
10  green   blue
11  green  green
12  green orange
13 orange    red
14 orange   blue
15 orange  green
16 orange orange
Josh
  • 321
  • 1
  • 7
  • 1
    `final <- data.frame(col1=rep(vec, each=length(vec)), col2=vec)` or `expand.grid(vec, vec)` – jogo Nov 16 '20 at 15:16
  • Your instinct is right that this is a slow way to run a loop. Allocating memory is a slow operation (cbind and rbind) regardless of the length of memory you allocate (so it's best to allocate only once if possible). It would be much faster to use your `rep` to create a `col1` without a loop. That would remove one loop. Then, turn your col1 into a dataframe, and declare a col2 full of `NA_character_` (allocate the memory only one time). Then use the `df$col2[j] <- vec[j]` to directly modify the memory location. Declare first, then loop through. – Adam Sampson Nov 16 '20 at 16:11
  • FYI: the memory allocation is the biggest reason that `lapply` and `purrr:map` are faster than a loop. So reorganizing your code and pre-allocating will be a big bump in speed and make things easier to read also. – Adam Sampson Nov 16 '20 at 16:12
  • `expand.grid(col2=vec, col1=vec, stringsAsFactors=FALSE)[2:1]` – jogo Nov 17 '20 at 09:47

1 Answers1

1

So the desired end result is a data.frame with every combination of two colours, correct?

library(tidyverse)

tibble() %>% 
  expand(
    col1=c("red", "blue", "green", "orange"),
    col2=c("red", "blue", "green", "orange")
  )
# A tibble: 16 x 2
   col1   col2  
   <chr>  <chr> 
 1 blue   blue  
 2 blue   green 
 3 blue   orange
 4 blue   red   
 5 green  blue  
 6 green  green 
 7 green  orange
 8 green  red   
 9 orange blue  
10 orange green 
11 orange orange
12 orange red   
13 red    blue  
14 red    green 
15 red    orange
16 red    red   

As a general rule of thumb, if you're resorting to using loops in R, there's generally a better (ie more efficient and succinct) way of doing it.

Limey
  • 10,234
  • 2
  • 12
  • 32