1

I am wondering if it is possible to shuffle a 4x4 data set while maintaining constant row and column sums. Admittedly I am a beginner at programming so the code I have included below may not be easy on the eyes.

Any help would be appreciated, thanks.

PS: If you must know, the data set is a survey of car preference based on ethnicity.

CarPreference <- read.table ( text = "
African 3 0 1 1
Asian 2 1 0 1
Hispanic 0 1 3 1
White 0 1 4 1
" )

row.names(CarPreference) <- CarPreference[,1]
colnames(CarPreference) <-c("Car Type","Car","Truck","SUV","Motorcycle")

CarPreference <- CarPreference[,-1]
as.matrix(CarPreference)

observed <- rbind(c(3,0,1,1),c(2,1,0,1),c(0,1,3,1),c(0,1,4,1))
deals=10000
observed.boot = array(NA,c(4,4,deals))
H0 <- c(rep(1,colSums(observed)[1]),rep(0,colSums(observed)[2]),rep(1,colSums(observed)[3]),rep(0,colSums(observed)[4]))
for (i in 1:deals)
{
data.boot <- sample(H0,sum(observed),replace=FALSE)

row1.boot <- data.boot[1:rowSums(observed)[1]]
row2.boot <- data.boot[(rowSums(observed)[1]+1):(rowSums(observed)[1]+rowSums(observed)[2])]
row3.boot <- data.boot[(rowSums(observed)[1]+rowSums(observed)[2]+1):(rowSums(observed)[1]+rowSums(observed)[2]+rowSums(observed)[3])]
row4.boot <- data.boot[(rowSums(observed)[1]+rowSums(observed)[2]+rowSums(observed)[3]+1):sum(observed)]

col1.boot <- data.boot[1:colSums(observed)[1]]
col2.boot <- data.boot[(colSums(observed)[1]+1):(colSums(observed)[1]+colSums(observed)[2])]
col3.boot <- data.boot[(colSums(observed)[1]+colSums(observed)[2]+1):(colSums(observed)[1]+colSums(observed)[2]+colSums(observed)[3])]
col4.boot <- data.boot[(colSums(observed)[1]+colSums(observed)[2]+colSums(observed)[3]+1):sum(observed)]

observed.boot[,,i] <- rbind(
c(sum(row1.boot),length(row1.boot)-sum(row1.boot), , ),
c(sum(row2.boot),length(row2.boot)-sum(row2.boot), , ),
c(sum(row3.boot),length(row3.boot)-sum(row3.boot), , ),
c(sum(row4.boot),length(row4.boot)-sum(row4.boot), , ))
}
  • What do you mean by shuffle? Do you mean you're going to use the exact numbers in your current matrix (4 0s, 8 1s, 1 2, 2 3s, and 1 4) or any integers as long as the row and column sums are equal to the original? Does this need to be random within the space of all matrices that meet your requirements? – josliber May 11 '14 at 04:12
  • Rather, I mean that the sum of all values in each row/column remain constant. In this matrix, sum of row 1 = 5, sum of row 2 = 4, etc. Similarly, col 1 = 5, col 2 = 3, etc. What I am really trying to do is a bootstrap version of Fisher's exact test on the 4x4 matrix. I could do it for a 2x2, but don't know how to generate numbers randomly in the "observed.boot" part for a 4x4 so that it meets those conditions. – user3624658 May 11 '14 at 04:16
  • I looked over the permutation test, which is similar, but not what I was hoping to achieve. Although there is a finite number of permutations, I would like to have both the row and column sums maintained with each "permutation." – user3624658 May 11 '14 at 05:06

1 Answers1

1

Boiling it down, you want to randomly shuffle the row labels of observations while leaving their column labels the same. You can do this by building a vector y of all the column indices and repeatedly shuffling them:

set.seed(144)
observed <- rbind(c(3,0,1,1),c(2,1,0,1),c(0,1,3,1),c(0,1,4,1))
x <- rep(1:nrow(observed), rowSums(observed))
y <- rep(1:ncol(observed), colSums(observed))
samples <- lapply(1:10000, function(a) table(x, sample(y)))

Now, samples contains a list of bootstrapped tables, with row and column sums matching observed.

samples[[1]] 
# x   1 2 3 4
#   1 1 1 2 1
#   2 0 0 2 2
#   3 2 0 2 1
#   4 2 2 2 0
samples[[10000]]
# x   1 2 3 4
#   1 1 1 2 1
#   2 2 1 1 0
#   3 1 1 2 1
#   4 1 0 3 2

This is identical to randomly sampling from the set of contingency tables with the same row and column sums of your original table.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Wow! This is exactly what I was hoping for in such few lines. I am new to this website, how can I leave you positive feedback/votes? Also, just wondering did you do set.seed(144) because there are 12 cells in the 4x4 and 12^2=144 ...? – user3624658 May 11 '14 at 05:31
  • @user3624658 No, I just set the seed so it would be reproducible. I think you can't upvote until you have 15 reputation :) – josliber May 12 '14 at 00:34