So I have a data set that contains sites, years, and a measured variable (let's say, x). x is measured a number of times throughout the year, across many years, and at multiple sites. Here is an example of my data set (each x was collected at different dates, I've simply extracted the year out of the dates as I'm interested in annual means). Let's call the data set df:
>df
site year x
a 2000 10
a 2000 12
a 2000 13
b 2000 14
b 2000 15
b 2000 17
c 2000 9
c 2000 11
c 2000 11
a 2001 11
a 2001 12
a 2001 12
b 2001 13
...
and it goes on for multiple years.
I want to extract the mean of x for each specific site and year. I wrote a for loop, but am having trouble with it. I'd like to return a data frame with site, year, and average for x, but it seems to take the mean of all variables found in df$x as the first value, and then returns NaNs for the rest of the results.
Here is my code:
temp <- NULL;
mn.x <- NULL;
a <- NULL;
for(i in unique(df$site)) {
for (j in unique(df$year)) {
site <- i;
year <- j;
a <- data.frame(site, year);
temp <- mean(na.omit(df$x[df$site==i && df$year==j]))
site.year <- data.frame(a, temp)
mn.x <- rbind(temp, site.year)
}
}
Just to be clear...the result that returns when I type mn.x in R is
>mn.x
[1] 10.4
[1] NaN
[1] NaN
[1] NaN
[1] NaN
...
where 10.4 is the mean of x for all values of df$x (aka mean(df$x))
What's wrong with my loop? Or, as this is an example data set, perhaps there is actually a problem with my dataset? Just to clarify...class(df$x) is "numeric"
Thanks for any thoughts,
Paul