1

I have a dataset that contains the feeding data of 3 animals, consisting of the animals' tag ids (1,2,3), the type (A,B) and amount (kg) of feed given at each 'meal':

Animal   FeedType   Amount(kg)
Animal1     A         10
Animal2     B         7
Animal3     A         4
Animal2     A         2
Animal1     B         5
Animal2     B         6
Animal3     A         2

Using this, I want to be able to output the matrix below which has unique('Animal') as its rows, unique('FeedType') as its columns and the cumulative Feed Amount (kg) in the corresponding cells of the matrix.

         A   B
Animal1  10  5
Animal2  2   13
Animal3  6   0

I started coding a solution using two for loops as below:

dataframe = read_delim(input_url, header=TRUE, sep = ";")
animal_feed_matrix = matrix(0,nrow(unique('Animal')),nrow(unique('FeedType')))
for (i in 1:length(unique('Animal')) ){
 a= unique('Animal')[i]
  for (j in 1:length(unique('FeedType')) ){
    ft= unique('FeedType')[j]
    animal_feed_matrix[i,j] = sum(dataframe [(dataframe ['Animal']==a & dataframe ['FeedType']==ft),'Amount(kg)'])
  }
}

But I am aware that this is a very inefficient way to tackle the problem, (plus the code above needs to be completed in order to work). I am aware that R has levels, and factors, which I sense can solve the problem more elegantly.

P.S: This question is somewhat similar to mine but even if the solution to my problem is contained within, it escapes me.

Community
  • 1
  • 1
Zhubarb
  • 11,432
  • 18
  • 75
  • 114

2 Answers2

1

You can do it with function dcast() from library reshape2.

library(reshape2)    
dcast(df,Animal~FeedType,sum,value.var="Amount")
   Animal  A  B
1 Animal1 10  5
2 Animal2  2 13
3 Animal3  6  0
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • Thank you, but the code gives an error: [Using Subsector as value column: use value.var to override. Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun]. Also, in this one-liner where do I specify that the 'sum' is on Amount? – Zhubarb Aug 06 '13 at 09:13
  • Function dcast() automatically use third column (if there is only three) to calculate sum. Updated example to more general solution with argument value.var= (to show which column to use). For your error - it seems that in your session sum is used as some variable. See this [SO question](http://stackoverflow.com/questions/7082792/error-message-running-the-example-from-the-reshape2-help-page) – Didzis Elferts Aug 06 '13 at 09:22
1

In base R:

out <- with(mydf, tapply(Amount, list(Animal, FeedType), sum))

         A  B
Animal1 10  5
Animal2  2 13
Animal3  6 NA

Then, to change NA to 0 (as in your example), just do:

out[is.na(out)] <- 0
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Wow, that is quite powerful for a single line of code. Works like a charm! Just a brief question: the 'out' matrix is too sparse now because I have too many FeedType columns. How can I omit the FeedTypes that occur less than a certain numer of times? (e.g. I want to exclude column 'FeedTypeA' in 'out' if in 'mydf' it has occurred only once) – Zhubarb Aug 06 '13 at 09:20
  • @Berkan I'd probably preprocess the original dataframe, excluding FeedTypes below a certain threshold before you run `by`. But, you can just remove columns from the resulting matrix with something like: `out[,!colnames(out)=="A",drop=FALSE]` (the `drop=FALSE` is unnecessary in large matrices, but prevents coercing the example matrix with just two columns to a vector). – Thomas Aug 06 '13 at 09:26