I'm currently stuck with my data frame and i would like to know how to do "subsets of subsets of subsets" Here is a part of my data frame:
YEAR RN DATE NAME SITE LONG SP SUMNB NB100
1 2011 RNN027 15056 ESTAGNOL RNN027-Estagnol 02 310 Anthocharis cardamines (Linnaeus, 1758) 1 0.3225806
2 2011 RNN027 15075 ESTAGNOL RNN027-Estagnol 02 310 Anthocharis cardamines (Linnaeus, 1758) 1 0.3225806
3 2003 RNN027 12166 ESTAGNOL RNN027-Estagnol 03 330 Anthocharis cardamines (Linnaeus, 1758) 2 0.6060606
4 2006 RNN027 13252 ESTAGNOL RNN027-Estagnol 03 330 Anthocharis cardamines (Linnaeus, 1758) 2 0.6060606
5 2006 RNN027 13257 ESTAGNOL RNN027-Estagnol 03 330 Anthocharis cardamines (Linnaeus, 1758) 2 0.6060606
6 2005 RNN027 12895 ESTAGNOL RNN027-Estagnol 01 540 Anthocharis cardamines (Linnaeus, 1758) 2 0.3703704
My point is to compute a abundance factor for each species. To do that, i have to isolate every count date for every species, every year, and every site.
My first idea was to do multiple loops and subseting every step by the previous criteria:
DF --> Loop SITE ; subset of each SITE -->loop YEAR; subset of each YEAR -->loop SP; subset of each SPECIES--> dates of observations
The point of isolating these dates require further modifications (adding rows), but i need to be capable of rewriting the modified subsets afterwards and reconstruct a new dataframe.
I built my loops command:
LOOPSITE<-sort(unique(DF$SITE))
for(i in LOOPSITE){
print(i)
LOOPSITESUB<-subset(DF,grepl(i,SITE))
LOOPYEAR<-sort(unique(LOOPSITESUB$YEAR))
print(LOOPYEAR)
for(j in LOOPYEAR){
print(j)
LOOPYEARSUB<-subset(LOOPSITESUB,grepl(j,YEAR))
LOOPSP<-sort(unique(LOOPYEARSUB$SP))
print(length(LOOPSP))
for(k in LOOPSP){
print(k)
LOOPSPSUB<-subset(LOOPYEARSUB,grepl(k,SP))
print(sum(LOOPYEARSUB$SUMNB))
print(head(LOOPSPSUB))
}
}
}
I am able to follow that my script is working with all these "print" commands, and it is working until i reach the species subseting. For an unknown reason, the last subsetting dont concern each species, but only some of them. Here is a part of the output for the last SITE and the last YEAR:
"RNN027-Estagnol 01"
...(I skipped all the sites)
"RNN027-Estagnol 06"
"2003"
...(I skipped all the years)
"2011"
[1] 22
[1] "Aricia agestis D., 1775"
[1] 107
YEAR RN DATE NOM SITE LONG SP SUMNB NB100
66 2011 RNN027 2011-04-21 ESTAGNOL RNN027-Estagnol 06 260 Aricia agestis D., 1775 1 0.3846154
67 2011 RNN027 2011-05-22 ESTAGNOL RNN027-Estagnol 06 260 Aricia agestis D., 1775 1 0.3846154
68 2011 RNN027 2011-08-05 ESTAGNOL RNN027-Estagnol 06 260 Aricia agestis D., 1775 2 0.7692308
[1] "Brintesia circe (Fabricius, 1775)"
[1] 107
[1] YEAR RN DATE NOM SITE LONG SP SUMNB NB100
<0 rows> (or 0-length row.names)
[1] "Carcharodus alceae (Esper, 1780)"
[1] 107
[1] YEAR RN DATE NOM SITE LONG SP SUMNB NB100
<0 rows> (or 0-length row.names)
It is working for "Aricia agestis D., 1775" but not for "Brintesia circe (Fabricius, 1775)". I verified on my dataframe, that second species have been observed at this time and place,and have the same format than the previous one...it should be working.
How many loops can i stack like this ? Is there another way to do that? (it would be convenient and faster). I'm aware of the "split" function, who basically dismont every group, but as i cant exploit every"chunk", it dont fit to my task. I am maybe wrong.
At the last step (after modifing all the subsets), i should be able to write each subset in a new dataframe to reconstruct a modified version of my input.
I'm am maybe on the wrongest way i possibly can go! I can provide further explanations if needed!
Thanks for your help!
EDIT:
I'll try to explain what i want to do. In order to calculate my abundance index, i need to add "blank" rows before and after each temporal "session" of observation. Basically, i try to obtain a subset for every combination of 3 differents factors (SITE, YEAR and SP).
Here is an example of the type of output i would like to obtain. For every SITE X/YEAR Y/SP Z possible combination:
YEAR RN DATE NAME SITE LONG SP SUMNB NB100
----ADD A NEW ROW----DATE MINUS 7 DAYS-----------------------------------------------------------------------------------
1 Y RNN027 15056 ESTAGNOL RNN027-Estagnol X 310 SP Z 1 0.3225806
2 Y RNN027 15075 ESTAGNOL RNN027-Estagnol X 310 SP Z 1 0.3225806
3 Y RNN027 12166 ESTAGNOL RNN027-Estagnol X 330 SP Z 2 0.6060606
4 Y RNN027 13252 ESTAGNOL RNN027-Estagnol X 330 SP Z 2 0.6060606
5 Y RNN027 13257 ESTAGNOL RNN027-Estagnol X 330 SP Z 2 0.6060606
6 Y RNN027 12895 ESTAGNOL RNN027-Estagnol X 540 SP Z 2 0.3703704
----ADD A NEW ROW----DATE PLUS 7 DAYS-----------------------------------------------------------------------------------
Then i rewrite and compile every modified subset in a new DF.
EDIT 2: The use of "split(DF, list(DF$SITE, DF$YEAR, DF$SP))" crashed my computer, unless I dropped the unused values. I got exactly what I want, but how can I access and modify every subset ?