0

I have a large matrix which comprises 1,2 and missing (coded as NA) values. The matrix has 500000 rows by 10000 columns. There are approximately 0.05% 1- or 2-values, and the remaining values are NA.

I would like to reorder the rows and columns of the matrix so that the top left corner of the matrix comprises a relatively high number of 1s and 2s compared to the rest of the matrix. In other words, I would like to create a relatively datarich subset of the matrix, by reordering the matrix rows and columns.

Is there an efficient method of achieving this in R, perhaps using a library? I would also be interested in solutions in Python or Java, but I would prefer to perform this in R as it is the language that's most familiar to me.

I thought that there maybe a set of optimisation procedures that I could use as my working matrix is too large to do the reorganisation by eye.

I have reverted a set of edits so that the question remains consistent with the current answers.

joel38237
  • 159
  • 1
  • 11
  • With your clarification (and a good night's sleep), I'm thinking a test for "density", or ratio of `is.numeric/is.na` for each row and column could be used to generate the reordering. – Carl Witthoft May 15 '14 at 11:57
  • So what would you consider a desirable reordering? Roland's answer does about the best possible unless you want to "weight" values in the very top left more than those a bit farther out. – Carl Witthoft May 15 '14 at 12:15

2 Answers2

3

Like this?

#some sparse data
set.seed(42)
p <- 0.0005
mat <- matrix(sample(c(1, 2, NA), 1e4, TRUE, c(p/2, p/2, 1-p)), ncol=50)

#order columns and rows by the number of NA values in them   
mat <- mat[order(rowSums(is.na(mat))), order(colSums(is.na(mat)))]

#only show columns and rows containing non-NA values
mat[rowSums(!is.na(mat)) > 0, colSums(!is.na(mat)) > 0]
#       [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]   NA   NA   NA   NA    2   NA
# [2,]   NA   NA   NA   NA   NA    2
# [3,]   NA   NA    2   NA   NA   NA
# [4,]   NA    1   NA   NA   NA   NA
# [5,]    1   NA   NA   NA   NA   NA
# [6,]   NA   NA   NA    2   NA   NA
Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    Probably the two of us should combine our answers into one matrix-crusher :-) – Carl Witthoft May 14 '14 at 15:39
  • Thanks for this answer. How about for situations where ordering my rows and columns doesn't provide a dense corner? I've edited the question in response to this comment. – joel38237 May 15 '14 at 10:12
  • Why are you asking me this? The task was to create a dense corner by reordering. If that is not possible, you have to say what you want to do. – Roland May 15 '14 at 10:47
  • @CarlWitthoft I don't understand your solution. (Or maybe I should say, I don't think it works for the general case.) – Roland May 15 '14 at 11:43
  • I agree -- I think, sans other weighting functions, your solution does the best reordering possible if the rows and columns are not allowed to be "jumbled" as in my edited approach. – Carl Witthoft May 15 '14 at 12:12
  • My earlier comment was missing a key point. I'm interested in situations where ordering rows and columns by the number of non-NA values does not provide a dense corner. I've edited the question in response to these comments. – joel38237 May 15 '14 at 12:48
  • Also, in response to the comment by Carl Witthoft, I should add that rows and columns cannot be "jumbled". – joel38237 May 15 '14 at 13:03
  • I don't understand your criterion for a "dense corner". Your new example matrix is not very sparse and to expect non-NA values in the 3x3 top corner seems to be arbitrary. If you can show an algorithm that gives your desired result I can show you how to implement it in R. Otherwise I can't help you further. – Roland May 15 '14 at 13:07
2

Something like this?

Rgames> bar
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]    1   NA   NA   NA    3
[3,]   NA   NA   NA   NA   NA
[4,]    2   NA   NA   NA    4
[5,]   NA   NA   NA   NA   NA

Rgames> rab<-bar[order(bar[,1]),]
Rgames> rab
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   NA   NA   NA    3
[2,]    2   NA   NA   NA    4
[3,]   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA
Rgames> rab[,order(rab[1,])]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3   NA   NA   NA
[2,]    2    4   NA   NA   NA
[3,]   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA

EDIT, as Roland pointed out, in a more general situation that won't come close. Now, if one is allowed to 'jumble' the rows and columns, this would do it:

for(j in 1:nrow(mfoo)) mat[j,]<-mat[j,order(mat[j,])]

for(j in 1:ncol(mat)) mat[,j]<-mat[order(mat[,j]),j]

I suspect that's not what is desired, so I'll go off and think some more about ordering "criteria"

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • Thanks for this answer. How about for situations where ordering my rows and columns doesn't provide a dense corner? I've edited the question in response to this comment. – joel38237 May 15 '14 at 10:20