1

As I am wont to do, I'm keeping tabs on my cats using matrices.

catWeights <- cbind(fluffy=c(5.0,5.1,5.2,5.3),misterCuddles=c(1.2,1.3,1.4,1.5),captainMew=c(4.3,4.2,4.1,4.0))
catTypes <- cbind(fluffy=c('cat','cat','cat','cat'),misterCuddles=c('kitten','kitten','kitten','cat'),captainMew=c('cat','cat','cat','cat'))
dates <- c("2013-01-01", "2013-01-02", "2013-01-03","2013-01-04")
row.names(catWeights) <- dates
row.names(catTypes) <- dates

On any date, I know how much each of them weigh:

> catWeights
           fluffy misterCuddles captainMew
2013-01-01    5.0           1.2        4.3
2013-01-02    5.1           1.3        4.2
2013-01-03    5.2           1.4        4.1
2013-01-04    5.3           1.5        4.0

And I know whether they're cats or kittens:

> catTypes
           fluffy misterCuddles captainMew
2013-01-01 "cat"  "kitten"      "cat"     
2013-01-02 "cat"  "kitten"      "cat"     
2013-01-03 "cat"  "kitten"      "cat"     
2013-01-04 "cat"  "cat"         "cat"  

How can I tell how much all my cats and all my kittens weigh through time?

I want this:

> totalWeights

             cat    kitten
2013-01-01   9.3       1.2
2013-01-02   9.3       1.3
2013-01-03   9.3       1.4
2013-01-04  10.8       0.0

On the fourth of January, Mister Cuddles turned 1, so he was no longer a kitten. His weight moved from the kitten bucket to the cat bucket.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
dvmlls
  • 2,206
  • 2
  • 20
  • 34
  • Might be better in future to store your data in long format, [like in this question](http://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format). – Blue Magister Jan 31 '14 at 01:18

3 Answers3

4

This seems valid using the sample data:

do.call(cbind, 
      lapply(c("cat", "kitten"), 
            function(x) rowSums(catWeights * (catTypes == x))))
#           [,1] [,2]
#2013-01-01  9.3  1.2
#2013-01-02  9.3  1.3
#2013-01-03  9.3  1.4
#2013-01-04 10.8  0.0

EDIT:

As @BlueMagister commented ... lapply(unique(as.vector(catTypes)), ... is the more general form of the answer. I guess, though, you've already found a way to overcome this, since you accepted the answer. The as.vector is because unique has a matrix method that is not convenient in this specific case.

Also, since I 'm in editing mode, I'll note that sapply could 've been used, but based on some rough benchmarks I 've made from time to time, I 've found lapply to be faster even if it is accompanied by a do.call(r/cbind, ..) or a unlist. I did not test it for a larger dataset in this specific case, though.

So, another format of the answer could've been:

sapply(unique(as.vector(catTypes)), 
             function(x) rowSums(catWeights * (catTypes == x)))
alexis_laz
  • 12,884
  • 4
  • 27
  • 37
  • 1
    +1. More generalized: `unique(catTypes)` instead of `c("cat", "kitten")`. Then set the column names of the matrix to `unique(catTypes)`. – Blue Magister Jan 31 '14 at 01:17
  • On a 2500x2500 matrix with 10 cat varietals, `microbenchmark` indicates that two approaches are similar, speed-wise. I'll post the results as an answer below. Thanks! – dvmlls Jan 31 '14 at 16:54
0

Here is a not very general answer that applies only to the example data set.

# Construct matrices for the cat weights and kitten weights
catWts <- ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)
kittenWts <- ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)

# Well, then just take the row sums for the two matrices
catSums <- rowSums(catWts)
kittenSums <- rowSums(kittenWts)

# Then combine it to a data frame
totalWeights <- data.frame(cat=catSums, kitten=kittenSums)

# In one line
data.frame(cat=rowSums(ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)),
           kitten=rowSums(ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)))

#            cat kitten
#2013-01-01  9.0    1.3
#2013-01-02 10.1    1.4
#2013-01-03 10.3    1.2
#2013-01-04 14.6    0.0

I would imagine that there is a more general approach to solving this problem.

ialm
  • 8,510
  • 4
  • 36
  • 48
  • I need a more general solution because I'm also keeping track of very old cats, infant cats, and other cats that turn out to be small mountain lions when they get older. – dvmlls Jan 30 '14 at 19:39
0

Microbenchmarking alexis_laz's two solutions on a 2500x2500 matrix with 10 groups:

> microbenchmark(cbindLapply(), sapplyOnly(), times=100)
Unit: milliseconds
          expr      min       lq   median       uq      max neval
 cbindLapply() 841.4796 865.2220 879.9099 892.6265 990.5915   100
  sapplyOnly() 846.3675 869.7372 879.0286 901.3314 979.6136   100
dvmlls
  • 2,206
  • 2
  • 20
  • 34