0

I am new to R and StackOverflow. I'm looking for some help with a specific problem.

I have a data set that includes a column for year, age, ID, horn length etc. There are several ID's in each year and each ID is in several years etc.

I want to create a for loop that

  1. subsets the data into years
  2. calculates the mean horn length within each year
  3. prints the mean to the screen

This is what I have so far:

BH_YEAR <- split(Bighorn, as.factor(Bighorn$Year))
for(i in 1:length(BH_YEAR)) {
  cat(mean(Bighorn$HornLength, na.rm=TRUE))
}

but it is just prints the mean horn length of the whole data set 24 times.

Any help very much appreciated.

nrussell
  • 18,382
  • 4
  • 47
  • 60
Molly
  • 1
  • 3
    Please do include some example dataset. – akrun Dec 29 '14 at 15:58
  • 1
    Use the so-called [split-apply-combine](http://www.jstatsoft.org/v40/i01/paper) approach as implemented in package plyr (or dplyr or data.table or various base functions like, e.g., `aggregate`). – Roland Dec 29 '14 at 16:00
  • 1
    To add a sample of your data, you can use `dput(Bighorn)` and copy & paste the output into your question. Also, when you have a few minutes, please read through some of the answers [in this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – nrussell Dec 29 '14 at 16:00
  • Does this have to be in a `for()` loop? I think using `aggregate()` like @Roland suggested above would be much easier: `aggregate(Bighorn$HornLength, by = list(Bighorn$Year), mean, na.rm = TRUE)` – Steven Dec 29 '14 at 16:45
  • Think about what you are doing. You're looping through the different splits in `BH_YEAR`, but you do not reference `BH_YEAR` in the loop. Why do you think the results for each iteration will be different? If you replace `mean(Bighorn$HornLength,...)` with `mean(BH_YEAR[[i]]$HornLength,...)` the loop should work. But you are still *much* bettter off with `aggregate(HornLength~Year,Bighorn,mean,na.rm=TRUE)` – jlhoward Dec 29 '14 at 16:59
  • Thanks for the help. Am new to this so didn't know how to include the dataset. Will remember for next time. And yes, it does have to be a for loop. – Molly Dec 29 '14 at 22:46

1 Answers1

0

1 To split the data frame based on a particular column, your approach is fine.

BH_YEAR <- split(Bighorn, as.factor(Bighorn$Year))

2 & 3 If your ultimate aim is to group a column by another column, no need to use split function. Use the aggregate function for this purpose:

aggregate(hornLength~year, data=Bighorn, mean)
Shubham Saini
  • 738
  • 3
  • 8
  • 18