So I have a code here:
library(data.table)
setDT(df)[, .SD[which.min(Julian_Day)]., (species,Year)]
Example of the df:
df=data.frame(
year=c(1901,1901,1901,1901,1901,1901,1901,1901,1901,1901,1901,1901,1901),
temp=c(29,25,21,26,20,20,26,25,24,23,23,24,26),
habitat=c("fst","fld","city","city","fst","fld","fst","road","river","river","city","city","city"),
species=c("blu","blu","pink","pink","pink","pink","pink","pink","pink","pink","pink","pink","pink"),
day= c(34,87,93,79,56,98,100,187,54,14,63,57,23))
what I want the new subset to look like:
dfout <- data.frame(
year=c(1901,1901,1901),
temp=c(29,25,21),
habitat=c("fst","fld","river"),
species=c("blu","blu","pink"),
day=c(34,87,14),
first10= c(NA,NA,23)
)
So this new subset would give me a new row with the mean temp for the first 10%(based on day) of the observations for EACH species for EACH year ( I have from years 1901-2000 and 100 species). As can be seen from above, the blu species only had 2 observations for 1901, therefore there is not enough data to give a mean for the first10% so NA is returned. Secondly, the observations that were not used to calculated the first 10% of observations were omitted from the new subset. If there were say, 30 observations of the pink species in 1901, then 3 rows would have been returned in the new subset, all with the same values in the first10% column.