2

I am trying to fill some rows of a (500,2) matrix with the row vector (1,0) using this code, last line is to verify the result:

data<-matrix(ncol=2,nrow=500)
data[41:150,]<-matrix(c(1,0),nrow=1,ncol=2,byrow=TRUE)
data[41:45,]

But the result is

> data[41:45,]
     [,1] [,2]
[1,]    1    1
[2,]    0    0
[3,]    1    1
[4,]    0    0
[5,]    1    1

instead of

> data[41:45,]
     [,1] [,2]
[1,]    1    0
[2,]    1    0
[3,]    1    0
[4,]    1    0
[5,]    1    0

(1) What am I doing wrong?

(2) Why aren't the row indices in the result 41, 42, 43, 44 and 45?

Zweifler
  • 375
  • 1
  • 4
  • 12

3 Answers3

4

You're trying to fill a part of the matrix, so the block you're trying to drop in there should be of the right size:

 data[41:150,]<-matrix(c(1,0),nrow=110,ncol=2,byrow=TRUE)
 # nrow = 110, instead of 1 !!!!

Otherwise your piece-to-be-added will be reverted to vector and added columnwise. Try, for example, this:

 data[41:150,] <- matrix(c(1,2,3,4,5), nrow=5, ncol=2, byrow=TRUE)
 data[41:45,]
     [,1] [,2]
[1,]    1    1
[2,]    3    3
[3,]    5    5
[4,]    2    2
[5,]    4    4

Can one complain? Yes, and now. No, because R behaves as documented (matrices are vectors with dimension attributes, and recycling works on vectors). Yes, because although recycling can be convenient, it may create false expectations.

Why aren't row indices 41,42,43,... ? I don't know, that's just the way matrices and vectors behave.

> (1:10)[5:6]
[1] 5 6

(Notice there's [1] in the output, not [5].)

Data frames behave differently, so you would see the original line numbers for slices:

 as.data.frame(data)[45:50,]
lebatsnok
  • 6,329
  • 2
  • 21
  • 22
  • Upvoted because it's the only answer so far that explains why the problem was occurring. – IRTFM Oct 15 '18 at 15:14
  • I guess I am in the complainer camp, something seems unnatural, overkillish about this. What would be the *simplest* way of filling more than one row at once? – Zweifler Oct 15 '18 at 16:13
  • 1
    Rows and columns are not created equal for matrices, the "natural" way of thinking here is columnwise. If you want to do rowwise manipulations, you may use double-transpose hat trick e.g., `dt <- t(data); dt[,41:150] <- c(1,0); t(dt)[41:45,]` – lebatsnok Oct 15 '18 at 16:26
  • Second thoughts about it: I think it may seem overkillish precisely because the recycling seems so tempting and works intuitively in simple cases. But recycling is simply not defined for matrices, it works only for vectors. So if you want to work with rows rather than columns (the "natural unit" for R's matrices) you can either (a) provide a replacement of exactly right size as in my answer, (b) transpose your matrix to work with columns instead of rows, or (c) use the double transpose trick as in the above comment. – lebatsnok Oct 15 '18 at 19:10
  • Wait, I just realized those numbers in the second code block aren't even in order! What is going? Is it the recycling? I mean, standard behavior being column-wise is something one has to work with but I don't get why they aren't in order. – Zweifler Oct 17 '18 at 00:18
  • 1
    They're in the exact columnwise order or your rowwise matrix: `c(matrix(c(1,2,3,4,5), nrow=5, ncol=2, byrow=TRUE)) # [1] 1 3 5 2 4 2 4 1 3 5` – lebatsnok Oct 17 '18 at 05:30
  • I see; I have been taking stuff for granted. I'm afraid I have a couple more questions: (1) why are you reversing the order after the 4 in `# [1] 1 3 5 2 4 2 4 1 3 5` and (2) shouldn't the sixth element [3,2] be NA since the vector has 5 elements but the matrix 6? – Zweifler Oct 18 '18 at 23:10
  • 1
    This is all about recycling. In `matrix(1:5, nrow=5, ncol=2, byrow=TRUE)`, the cells are filled row by row, so the first row is: `1 2` , second: `3 4`, third: `5 1`, fourth: `2 3`, fifth: `4 5`. When this matrix is treated as a vector (as in your case), the elements are taken columnwise, so the first elements of each row come first: `1 3 5 2 4` ... and then the second elements of each row: `2 4 1 3 5`. Without recycling happening by default, you would get an error (or NA). – lebatsnok Oct 19 '18 at 18:16
  • That explains it. I have learned a lot from this; thanks so much! – Zweifler Oct 21 '18 at 02:49
3

It will be cleaner to just do this column-wise:

data[41:150, 1L] = 1
data[41:150, 2L] = 0

You could also accomplish this in one line with matrix indexing like so:

data[cbind(rep(41:150, each = 2L), 1:2)] = 1:0
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • I see that in the first code block the second index selects the column but why is the _L_ needed? _2L_ seems to mean something different in the the one-liner. – Zweifler Oct 15 '18 at 16:16
  • Also, _1:0_ seems to work only because _1_ and _0_ are consecutive, if that weren't the case would you do c(a,b) (with a,b being any numbers including non-consecutive)? – Zweifler Oct 15 '18 at 16:19
  • 1
    @zweifler 2L is an integer, 2 is a double. the difference is negligible for most day-to-day usage but I made a habit of it anyway. 1:2 from the one-liner corresponds to 1L and 2L; R compiler knows to make 1:2 integers automatically. and yes, 1:0 is some syntactic sugar owing to being consecutive; c(a,b) is indeed the more general approach. – MichaelChirico Oct 16 '18 at 00:27
2

You could use rep.

data[41:150,] <- rep(1:0, each=150-41+1)

#> data[41:45,]
#     [,1] [,2]
#[1,]    1    0
#[2,]    1    0
#[3,]    1    0
#[4,]    1    0
#[5,]    1    0

I think MichaelChirico approach is the cleanest/savest to use.

Andre Elrico
  • 10,956
  • 6
  • 50
  • 69