4

this is a simple enough question that I'm suprised I can't find any reference to anyone having asked it before. It's not the same as this, nor is it covered by this discussion.

I have a 4-d matrix (dimensions 16x10x15x39) with named dimnames (it's what happens when you cast a dataframe from, e.g. a csv. You get to the names of the dimnames with names(dimnames(matrix)))

I then want to replace the columns (i.e. the first dimension) with fractions of the row total, so I do this:

matrix2 <- apply(matrix1, c(2,3,4), function(x){x/sum(x)})

But now names(dimnames(matrix2)) is blank for the first dimension. The other dimname names have been preserved.

So: how can I run apply over a matrix with named dimnames and keep the names of all the remaining dimensions?

A reproducable example

Here's simple example of the problem. Just run the whole code and look at the last two lines.

x <- data.frame(
  name=c("bob","james","sarah","bob","james",
         "sarah","bob","james","sarah","bob",
         "james","sarah"),
  year=c("1995","1995","1995","1995","1995",
         "1995","2005","2005","2005","2005",
         "2005","2005"),
  sample_num=c("sample1","sample1","sample1",
               "sample2","sample2","sample2",
               "sample1","sample1","sample1",
               "sample2","sample2","sample2"),
  value=c(1,2,3,2,3,4,1,2,3,2,3,4)
  )
x <- cast(x, sample_num ~ name ~ year)
x_fractions <- apply(y,c(2,3),function(x){x / sum(x)})

names(dimnames(x))
names(dimnames(x_fractions))
Community
  • 1
  • 1
LondonRob
  • 73,083
  • 37
  • 144
  • 201
  • I wonder why you need to `cast` the data.frame. It's usually better to work with long data format. – Roland Feb 14 '13 at 12:27
  • Hi Roland. I agree, the `cast` looks odd, but there are some sensible display reasons for doing it like that. – LondonRob Feb 14 '13 at 12:37
  • `apply` coerces its outputs to `as.vector` , so maybe the real question is "how did any dimnames survive the `apply` process? – Carl Witthoft Feb 14 '13 at 13:01

3 Answers3

2

I'm not quite sure what you're looking for, but I think sweep function fits well for your goal. Try:

result <- sweep(test, c(2,3,4), colSums(test), FUN='/')

Where test is the array created by @user2068776. dimnames are preserved.

dimnames(result)
$a
[1] "a1" "a2"

$b
[1] "b1" "b2"

$c
[1] "c1" "c2"

$d
[1] "d1" "d2"
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • Thanks Jilber. That certainly does the job! Any insights into the use of apply w.r.t. dimname names? i.e. what's going on or not going on when the first name gets dropped? – LondonRob Feb 19 '13 at 14:36
1

It is really ambiguous to answer question without a reproducible example. I answer this question, because producing an example here is interesting.

dat <- array(rnorm(16*10*15*39))
dim(dat) <- c(16,10,15,39)
dimnames(dat) <- lapply(c(16,10,15,39),
                        function(x) paste('a',sample(1:1000,x,rep=F),sep=''))
dat2 <- apply(dat, c(2,3,4), function(x){x/sum(x)})
identical(dimnames(dat2) ,dimnames(dat))
[1] TRUE

I get the same dimanmes for dat and dat2. So surely I miss something here.

agstudy
  • 119,832
  • 17
  • 199
  • 261
0

I can't reproduce this behavior using a non-casted array. Could you provide a reproducable example of what your dataframe/array actually looks like? Otherwise it is difficult to find out where the problem is.

Here is the code I used for testing this:

# example
a <- c(1,2,11,22)
b <- c(3,4,33,44)
c <- c(5,6,55,66)
d <- c(7,8,77,88)

test <- array(c(a,b,c,d), c(2,2,2,2), dimnames=list(
a=c("a1","a2"),
b=c("b1","b2"),
c=c("c1","c2"),
d=c("d1","d2") )
)
dimnames(test)
names(dimnames(test))

# apply
test2 <- apply(test, c(2,3,4), function(x){
  entry <- sum(x)
})
dimnames(test2)
names(dimnames(test2))

Sorry for the comment "diguised" as an answer. I am new to SO and as it seems you need a higher rep to post comments.

Edit: Your dimnames might get lost, because for whatever reason the function your defined produces unnamed results. You can try saving x/sum(x) as an object (like I did) and then naming that object inside your function. I skipped the last part, because for me there were no missing names / dimnames

SimonG
  • 4,701
  • 3
  • 20
  • 31