I am hoping to get every possible combination of elements taken across vectors. These questions show what I am trying to do in Python (Get the cartesian product of a series of lists?) and R (Cartesian product data frame).
However, following the answer from the latter in R (https://stackoverflow.com/a/4309350/9096420), I run into memory issues (i.e., cannot allocate vector of size 86792.1 Gb; R memory management / cannot allocate vector of size n Mb). I have tried a few of the answers therein, but my vector size appears to be too big to overcome.
This leads me to think that something is wrong with how I am approaching this problem. There are many possible combinations, but this seems solvable.
My data:
dat<-structure(list(rows = c(62L, 63L, 64L, 65L, 68L, 69L, 70L, NA,
NA, NA, NA, NA, NA), rows.1 = c(119L, 120L, 122L, 123L, 124L,
125L, NA, NA, NA, NA, NA, NA, NA), rows.2 = c(137L, 138L, 139L,
140L, 141L, 142L, 143L, 144L, 145L, NA, NA, NA, NA), rows.3 = c(161L,
162L, 163L, 164L, 165L, 166L, 167L, NA, NA, NA, NA, NA, NA),
rows.4 = c(168L, 169L, 170L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), rows.5 = c(148L, 149L, 150L, 151L, 152L, 153L,
154L, 155L, 156L, NA, NA, NA, NA), rows.6 = c(135L, 136L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), rows.7 = c(108L,
109L, 110L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), rows.8 = c(116L,
117L, 118L, 121L, NA, NA, NA, NA, NA, NA, NA, NA, NA), rows.9 = c(178L,
180L, 181L, 182L, 183L, NA, NA, NA, NA, NA, NA, NA, NA),
rows.10 = c(179L, 184L, 185L, 186L, 187L, 188L, 189L, 190L,
191L, 192L, 193L, 194L, 195L), rows.11 = c(50L, 51L, 52L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-13L))
dat
rows rows.1 rows.2 rows.3 rows.4 rows.5 rows.6 rows.7 rows.8 rows.9 rows.10 rows.11
1 62 119 137 161 168 148 135 108 116 178 179 50
2 63 120 138 162 169 149 136 109 117 180 184 51
3 64 122 139 163 170 150 NA 110 118 181 185 52
4 65 123 140 164 NA 151 NA NA 121 182 186 NA
5 68 124 141 165 NA 152 NA NA NA 183 187 NA
6 69 125 142 166 NA 153 NA NA NA NA 188 NA
7 70 NA 143 167 NA 154 NA NA NA NA 189 NA
8 NA NA 144 NA NA 155 NA NA NA NA 190 NA
9 NA NA 145 NA NA 156 NA NA NA NA 191 NA
10 NA NA NA NA NA NA NA NA NA NA 192 NA
11 NA NA NA NA NA NA NA NA NA NA 193 NA
12 NA NA NA NA NA NA NA NA NA NA 194 NA
13 NA NA NA NA NA NA NA NA NA NA 195 NA
My goal is to combine one value from each column with every possible combination of values from other columns (only allowing one value from each column). A small example of only two columns that works:
expand.grid(dat[,1],dat[,2])
Var1 Var2
1 62 119
2 63 119
3 64 119
4 65 119
5 68 119
6 69 119
7 70 119
8 NA 119
9 NA 119
10 NA 119
11 NA 119
12 NA 119
13 NA 119
14 62 120
15 63 120
16 64 120
17 65 120
18 68 120
19 69 120
20 70 120
21 NA 120
22 NA 120
23 NA 120
24 NA 120
25 NA 120
26 NA 120
27 62 122
28 63 122
# ... output truncated
When I try to do it for the entire dataset, I get memory issues:
Either
expand.grid(dat)
OR
expand.grid(dat[,1],dat[,2],dat[,3],dat[,4],dat[,5],dat[,6],dat[,7],dat[,8],
dat[,9],dat[,10],dat[,11],dat[,12])
(which I assume to be the same), both produce an error:
Error: cannot allocate vector of size 86792.1 Gb
Is there a simpler way to do this that gets around memory issues? What am I doing wrong here?
Here is another way to do it with nested for loops (but it is incredibly cumbersome because it requires a for loop for every column of data:
output<-NULL
for(h in 1:13){
for(i in 1:13){
for(j in 1:13){
output<-rbind(output,
c(dat[h,1],dat[i,2],dat[j,3])
)
}}}
output
[,1] [,2] [,3]
[1,] 62 119 137
[2,] 62 119 138
[3,] 62 119 139
[4,] 62 119 140
[5,] 62 119 141
[6,] 62 119 142
[7,] 62 119 143
[8,] 62 119 144
[9,] 62 119 145
[10,] 62 119 NA
[11,] 62 119 NA
[12,] 62 119 NA
[13,] 62 119 NA
[14,] 62 120 137
[15,] 62 120 138
[16,] 62 120 139
[17,] 62 120 140
[18,] 62 120 141
[19,] 62 120 142
[20,] 62 120 143
[21,] 62 120 144
[22,] 62 120 145
# ... output truncated
If I wanted to do this for every combination it would look like:
for(h in 1:13){
for(i in 1:13){
for(j in 1:13){
for(k in 1:13){
for(l in 1:13){
for(m in 1:13){
for(n in 1:13){
for(o in 1:13){
for(p in 1:13){
for(q in 1:13){
for(r in 1:13){
for(s in 1:13){
output<-rbind(output,
c(dat[h,1],dat[i,2],dat[j,3],dat[k,4],
dat[l,5],dat[m,6],dat[n,7],dat[o,8],
dat[p,9],dat[q,10],dat[r,11],dat[s,12])
)
}}}}}}}}}}}}