0

I would like to run a "for" loop that uses three indices. Basically, I want to subset a data frame, find the mean of the subset, and place the mean value in a new data frame. I am having trouble running this loop; all I get is NaN's.

The first index is used to match the rows of the new data frame (which I call data.avg); The second index is used to index to a vector that will be used in the first half of the subsetting condition (that the date values be from a specific month); the second index is the same as the above, but for the second part of the subsetting condition (that the row is associated with a Breakfast/Dinner/Snacks).

# Create the data frame
data1 = data.frame(date = sort(rep(as.Date(42948:43101, origin = "1899-12-30"),3)),
               serving = rep(c("Breakfast", "Dinner", "Snacks"), 154),
               units = rep(c(1,5,49), 154)
)
View(data1[order(data1$date),])

# take mean of each subset and place it in a new data frame called data.avgs
# it should consist of 8x3 data frame; rows (column1) are "August","September", "October", "November", "December", "January","February", "March".
# columns should be "Breakfast", "Dinner", "Snack"
month.index = c(8:12, 1)
serving.index = c("Breakfast", "Dinner", "Snack")

# create the data frame with the means using placeholder data
data.avg = data.frame(months = c(month.name[8:12], month.name[1]),
                  bf.avg = c(1:6),
                  dinner.avg = c(1:6),
                  snack.avg = c(1:6))

# now start replacing; find the mean of the subset of the original data frame.
# find the mean of all dates that are for August, and whose serving type are for Breakfast. 

    for(j in 1:6){
  for(i in month.index){
    for(v in 2:4){
      data.avg[j,v] = mean(
        subset(data1,
               months(data1$date) == month.name[i] & data1$serving == serving.index[v])$units
      )
    }
  }
}

When I run the mean without the loop, for example, this;

mean(subset(data1, 
            months(data1$date) == "September" & data1$serving == "Breakfast")$unit)

I get the correct mean. Because of this, I am thinking that my issue may lie in the index setup.

Any and all help would be greatly appreciated,

Thanks

edit; fixed the above code. The resulting data frame is the following;

months bf.avg dinner.avg snack.avg
1    August      5         49       NaN
2 September      5         49       NaN
3   October      5         49       NaN
4  November      5         49       NaN
5  December      5         49       NaN
6   January      5         49       NaN

Here is what I am looking for;

mean(subset(data1, 
+             months(data1$date) == "September" & data1$serving == "Breakfast")$unit)
[1] 1
> mean(subset(data1, 
+             months(data1$date) == "September" & data1$serving == "Dinner")$unit)
[1] 5
> mean(subset(data1, 
+             months(data1$date) == "September" & data1$serving == "Snacks")$unit)
[1] 49

My understanding is that these should be the data1.avg[1,1:3]

im2wddrf
  • 551
  • 2
  • 5
  • 19
  • `help("aggregate")` – Roland Jan 29 '18 at 07:55
  • Possible duplicate of [Grouping functions (tapply, by, aggregate) and the \*apply family](https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family) – jogo Jan 29 '18 at 08:13

1 Answers1

1

You set "Snack" in your serving.index, but you have "Snacks" in data1.

And then try this code in the for loop:

data.avg[j,v+1] = mean(
    subset(data1,months(data1$date) == month.name[i] & as.character(data1$serving) == serving.index[v])$units)

data.avg
     months bf.avg dinner.avg snack.avg
1    August      1          5        49
2 September      1          5        49
3   October      1          5        49
4  November      1          5        49
5  December      1          5        49
6   January      1          5        49
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • Thanks! It fixed the first column but the other two seem to be off (I calculated the mean without running the loop). I think all of dinner.avg should be 5 (for all months) and all of snacks.avg should be 49 for all months. Would you know why I am getting consecutive numbers (1:6) ? – im2wddrf Jan 29 '18 at 08:30
  • You are modifying only second column: data.avg[j,2] – Terru_theTerror Jan 29 '18 at 08:35
  • Thank you! error fixed. I performed the index incorrectly. Mostly typos on my part. – im2wddrf Jan 29 '18 at 08:48