-2

I'm kind of new and still learning R. The few posts I've looked into haven't been very helpful so far.

So, my results' dataframe df.results looks like this:

         | Age  | Flock | Year | Heating  | Cooling
------------------------------------------------------
1        |  1   |  1    | 2010 | 266.5788 |    0
2        |  1   |  1    | 2010 | 275.4562 |    0
3        |  1   |  1    | 2010 | 285.1423 |    0
...
200000   |  15  |  28   | 2020 |-39.84244 |  275.8492
...
400000   |  35  |  45   | 2030 |-41.09734 |  284.5375
...             
900000   |  12  |  300  | 2040 |-42.22414 |  292.3389 
...
150000   |  22  |  181  | 2050 | 28.9140  |    0
...
250000   |  34  |  322  | 2070 | -38.5952 |  430.8928
...

So, Flock ranges from 1 to 322. And Year goes from 2010 to 2090 in steps of 10 (only 9 different values).

My goal is to create matrices with 322 rows (flocks) and 9 columns (year) with the sum of Heating (1st matrix) and Cooling (2nd matrix) per flock per year.

I tried this code:

list.years <- seq(2010, 2090, 10)
nyears <- length(list.years)
f <- 322

sum.heat <- matrix(0, f, length(nyears))
sum.cool <- matrix(0, f, length(nyears))


for(j in 1:nyears){
    for(i in 1:f){
       sum.heat[i,j] <- sum(df.results$Heating[df.results$Flock == i], na.rm = TRUE)
       sum.cool[i,j] <- sum(df.results$Cooling[df.results$Flock == i], na.rm = TRUE)
}}

For some reason, this is not working:

Error in `[<-`(`*tmp*`, i, j, value = sum(df.results$Ventilation[df.results$Flock ==  : subscript out of bounds

I've tried several ways found online but I can't figure out why mine is not working. I also tried using the "new matrices" as "data frames" with no success.

Much appreciated if anyone can help out or suggest different approaches to make this work.

(P. S. Please let me know if this isn't clear. I'm happy to edit or explain it differently).

Thanks!!

madmex
  • 15
  • 3

1 Answers1

0

You can use the dcast() function from data.table package instead of double for loops to achieve this.

# data sample with 2 flocks, 3 years, & 2 entries per year per flock
set.seed(222)
df.sample <- data.frame(Flock = c(rep(1, 6), rep(2, 6)),
                        Year = rep(c(2010, 2020, 2030), 4),
                        Heating = rnorm(12, sd = 50),
                        Cooling = rnorm(12, mean = 100, sd = 30))

> df.sample
   Flock Year      Heating   Cooling
1      1 2010  74.38785448  79.22177
2      1 2020  -0.09459503 118.07947
3      1 2030  69.05103950  94.06741
4      1 2010 -19.01068157  64.42376
5      1 2020   9.20681152  39.83461
6      1 2030 -12.34479415 100.22530
7      2 2010 -60.77804548 115.58471
8      2 2020  78.07025492  77.61114
9      2 2030  21.36550986 121.79364
10     2 2010 -60.05117532 121.40970
11     2 2020  52.62292475  80.49811
12     2 2030 -65.25317830 144.96089

library(data.table)

dcast(dt.sample, Flock~Year, fun = sum, value.var = "Heating")

  Flock       2010       2020      2030
1     1   55.37717   9.112216  56.70625
2     2 -120.82922 130.693180 -43.88767

dcast(dt.sample, Flock~Year, fun = sum, value.var = "Cooling")

  Flock     2010     2020     2030
1     1 143.6455 157.9141 194.2927
2     2 236.9944 158.1092 266.7545

p.s. Generally, using for loops in R is a Bad Idea. Circle 3 of Patrick Burns' The R Inferno discusses this in some detail & is worth a read.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • Many thanks Z.Lin. It works really well. This is exactly what I was looking for. My C++-trained mind keeps for loops as first options for almost everything. And thanks for the resource you provide, I assume it contains different packages to replace the for loops once and for all, right? – madmex Aug 28 '17 at 16:53
  • @JorgeIzar: The urge to default to loops isn't unique to C++, for those of us with experience in other languages. :) But it's good to remember that R is optimised for vectorised functions. The two packages people usually use are `data.table` & the [tidyverse](https://www.tidyverse.org/) collection (which includes `dplyr`, `tidyr`, etc). [This post](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) compares their relative merits. – Z.Lin Aug 29 '17 at 00:51