1

I have a matrix like

      [,1] [,2]
 [1,]    1    3
 [2,]    4    6
 [3,]   11   12
 [4,]   13   14

I want to convert this matrix to a vector like this:

# indices 1-6, 11-14 = 1, gap indices 7-10 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1)

The idea: The matrix has values from 1 through 14. And the length of the vector is also 14. If you assume the first column to be the start and the second column to be the end, then for those ranges present in the matrix, i.e., 1-3, 4-6, 11-12, 13-4 (or equivalently 1-6, 11-14), I want the values at these indices to be 1 in my output vector. And the gap of 7-10 in my matrix should have a value of 0 at indices 7-10 in my output vector. (Thanks for the edit)

However, sometimes the matrix does not give the last value in the matrix. However, I always know the size of after the transformation, let say, in this case, 20. Then, the resulting vector should like this:

# indices 1-6, 11-14 = 1, gap indices 7-10 = 0, indices 15-20 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0)

How can I do that without a loop? My matrix is quite long, I tried using loop is slow.

user1938809
  • 1,135
  • 1
  • 9
  • 12
  • What's the transformation? – Roman Luštrik Jun 15 '13 at 07:28
  • What is determining whether the vector elements are 1 or 0? Also, the vector is longer than the number of elements in the matrix – alexwhan Jun 15 '13 at 07:32
  • @alexwhan, Roman, Since the OP seems to be unresponsive, I've gone ahead and edited the question. – Arun Jun 15 '13 at 08:00
  • Sorry, I was not clear enough. I will edit the question. Thanks. – user1938809 Jun 15 '13 at 16:39
  • OP: You say in the comments to Arun's answer that your matrix is 20millions-by-2. This is very important information that should go into your question. Also what will be the expected length of `xx` (big I imagine) and are there any overlaps between your ranges (your example suggests "no"). – flodel Jun 15 '13 at 21:24

3 Answers3

2

Here's an answer using IRanges package:

require(IRanges)
xx.ir <- IRanges(start = xx[,1], end = xx[,2])
as.vector(coverage(xx.ir))
# [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1

If you specify a min and max value of your entire vector length, then:

max.val <- 20
min.val <- 1
c(rep(0, min.val-1), as.vector(coverage(xx.ir)), rep(0, max.val-max(xx)))
Arun
  • 116,683
  • 26
  • 284
  • 387
  • This is not generic enough. What do I do if I know the total length is 20 instead of 14? Thanks. – user1938809 Jun 15 '13 at 16:17
  • This answers your question (well, the edit to your question that I made to improve it). If you think this doesn't answer what you require, please edit your question explaining *clearly* what you *really* want. It's impossible to know what you think. – Arun Jun 15 '13 at 16:25
  • If you look at the matrix, the last value is 14. But, the matrix does not always contain the last value. For example, if you know the size of vector is 20. Then, after the mapping, it will give c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0). Is this clear? Thanks. – user1938809 Jun 15 '13 at 16:29
  • yes, but this is pretty straightforward I'd think. Check the edit. – Arun Jun 15 '13 at 16:39
  • Sorry, this method works in a small setting. My matrix is 20million+ by 2. I use ff package to store my matrix. May I ask how to just go record the indexes like (1,2,3,4,5,6,11,12,13,14) without a loop. Thanks in advance! – user1938809 Jun 15 '13 at 17:10
  • I don't follow what you mean again. But I believe this is a separate question. I'd advice to post a new one, but investing some more time in framing the question with no/less ambiguity. – Arun Jun 15 '13 at 17:16
1

@ Arun's answer seems better.

Now that I understand the problem (or do I?). Here is a solution in base R that makes use of the idea that only contiguous sequences of zeroes need to be kept.

find.ones <- function (mat) {
  ones <- rep(0, max(mat))
  ones[c(mat)] <- 1
  ones <- paste0(ones, collapse="")
  ones <- gsub("101", "111", ones)
  ones <- as.numeric(strsplit(ones, "")[[1]])
  ones
}

On the OP's original example:

m <- matrix(c(1, 3, 4, 6, 11, 12, 13, 14), ncol=2, byrow=TRUE)
find.ones(m)
[1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1

To benchmark the solution, let's make a matrix big enough:

set.seed(10)
m <- sample.int(n=1e6, size=5e5)                                              
m <- matrix(sort(m), ncol=2, byrow=TRUE)                                           

head(m)                                                           
     [,1] [,2]
[1,]    1    3
[2,]    4    5
[3,]    9   10
[4,]   11   13
[5,]   14   18
[6,]   22   23

system.time(ones <- find.ones(m))

 user  system elapsed 
1.167   0.000   1.167 
asb
  • 4,392
  • 1
  • 20
  • 30
  • This is not an answer to the OP's question. – alexwhan Jun 15 '13 at 07:29
  • Okay, I have explained that it is not clear if he wants a transformation or just change the matrix to a vector. I answered based on my interpretation of the question. When the question becomes clearer, I can edit my answer. I still don't see why it needs downvoting. – asb Jun 15 '13 at 07:32
  • your questions should be directed to the OP at the comment like everyone else has done. The downvote tooltip says: "this answer is not useful". And yes, this is not useful for this question. So, when you make the edit, we'll upvote. – Arun Jun 15 '13 at 07:44
  • @Arun: Okay! I should have stuck to the comments. – asb Jun 15 '13 at 07:49
  • Our results don't match for the big matrix. – Arun Jun 15 '13 at 09:23
  • Do `m <- m[1:20, ]` in your big matrix and compare our results. You'll find that between 14-18, your results are not what it's supposed to be. – Arun Jun 15 '13 at 09:27
  • Dang, I think the question is a moving target! :D I'm going to add an edit to ask the OP to use your solution. – asb Jun 15 '13 at 10:39
1

Throwing this one here, it uses base R and should be somewhat fast since the inevitable loop is handled by rep:

zero.lengths <- m[,1] - c(0, head(m[,2], -1)) - 1
one.lengths  <- m[,2] - m[,1] + 1

rep(rep(c(0, 1), nrow(m)),
    as.vector(rbind(zero.lengths, one.lengths)))

Or another solution using sequence:

out <- integer(m[length(m)])    # or `integer(20)` following OP's edit.
one.starts  <- m[,1]
one.lengths <- m[,2] - m[,1] + 1
one.idx <- sequence(one.lengths) + rep(one.starts, one.lengths) - 1L
out[one.idx] <- 1L
flodel
  • 87,577
  • 21
  • 185
  • 223