1

I have some complicate dataframe manipulation to do, and i have no idea how to do it.

Here is a part my data frame:

    YEAR    RN      DATE    NOM         SITE                LONG    SP               SUMNB  NB100
1   2011    RNN027  15056   ESTAGNOL    RNN027-Estagnol 02  310 Anthocharis cardamines (Linnaeus, 1758) 1   0.3225806
2   2011    RNN027  15075   ESTAGNOL    RNN027-Estagnol 02  310 Anthocharis cardamines (Linnaeus, 1758) 1   0.3225806
3   2003    RNN027  12166   ESTAGNOL    RNN027-Estagnol 03  330 Anthocharis cardamines (Linnaeus, 1758) 2   0.6060606
4   2006    RNN027  13252   ESTAGNOL    RNN027-Estagnol 03  330 Anthocharis cardamines (Linnaeus, 1758) 2   0.6060606
5   2006    RNN027  13257   ESTAGNOL    RNN027-Estagnol 03  330 Anthocharis cardamines (Linnaeus, 1758) 2   0.6060606
6   2005    RNN027  12895   ESTAGNOL    RNN027-Estagnol 01  540 Anthocharis cardamines (Linnaeus, 1758) 2   0.3703704
7   2005    RNN027  12910   ESTAGNOL    RNN027-Estagnol 01  540 Anthocharis cardamines (Linnaeus, 1758) 2   0.3703704
8   2011    RNN027  15075   ESTAGNOL    RNN027-Estagnol 01  540 Anthocharis cardamines (Linnaeus, 1758) 1   0.1851852
9   2008    RNN027  14120   ESTAGNOL    RNN027-Estagnol 05  960 Anthocharis cardamines (Linnaeus, 1758) 2   0.2083333
10  2011    RNN027  15065   ESTAGNOL    RNN027-Estagnol 05  960 Anthocharis cardamines (Linnaeus, 1758) 1   0.1041667
11  2011    RNN027  15075   ESTAGNOL    RNN027-Estagnol 05  960 Anthocharis cardamines (Linnaeus, 1758) 1   0.1041667
12  2007    RNN027  13679   ESTAGNOL    RNN027-Estagnol 05  960 Apatura ilia (Denis & Schifferm�ller, 1775) 2   0.2083333
13  2004    RNN027  12573   ESTAGNOL    RNN027-Estagnol 06  260 Aporia crataegi (Linnaeus, 1758)    2   0.7692308
14  2005    RNN027  12917   ESTAGNOL    RNN027-Estagnol 06  260 Aporia crataegi (Linnaeus, 1758)    2   0.7692308
15  2006    RNN027  13301   ESTAGNOL    RNN027-Estagnol 06  260 Aporia crataegi (Linnaeus, 1758)    2   0.7692308

This DF compile the observation of species (SP) in a bunch of sites (SITE) for each year of a period (YEAR).

I need to produce dummy rows in order to further compute an abundance index. I need some dummy rows for each existing YEAR*SITE*SP combination. I insist on the "existing" because the SP list is different every YEAR*SITE, and by that i mean that i dont know the lentgh of the SP vector at each subset.

Here is the first step of complication: In order to compute my abundance Index, i need to determine the protocol period by searching the first and last date of each YEAR*SITE subset, regardless of the species. So find two dates for every subset.

Second step: Within a subset, for each X species of the subset, i need to create two rows using the first and last dates found in step one. These rows must have the same format than the input data frame (9 rows). So the YEAR, RN, SITE columns will be filled by the right values (current year and site combination). The DATE column filled by first then last date of protocol, the SP filled by the X species name and finally SUMNB and NB100 set to zero (dummy observations).

My point is to create a new dataframe with all these dummy rows (new dummy observations) to further merge it with my input dataframe.

Here is an example of the dummy rows for two species named 1 and 2 within the subset YEAR=i and SITE=y:

    YEAR     RN      DATE        NOM         SITE    LONG   SP       SUMNB  NB100
1   YEAR i   RNN027  FIRST DATE  ESTAGNOL    SITE y  310    SPECIES 1 0     0
2   YEAR i   RNN027  LAST DATE   ESTAGNOL    SITE y  310    SPECIES 1 0     0
3   YEAR i   RNN027  FIRST DATE  ESTAGNOL    SITE y  310    SPECIES 2 0     0
4   YEAR i   RNN027  LAST DATE   ESTAGNOL    SITE y  310    SPECIES 2 0     0

Here is the steps i think are necessary:

  • 1- Create an empty dataframe in order to collect the rows
  • 2- Subseting YEAR*SITE combinations (maybe applyfunction or function call
  • 3- min and max DATE value of the subset
  • 4- Do a for loop (1 to length of SP list) for each subset
  • 5- Within the loop, create two rows, gather the data from the input DF in order to fill all columns with current values of vectors. Has to be done for min and max dates.
  • 6- Fill the empty dataframe with output.

I hope its clear enough! I'll explain more if it has to be.

Thanks for your help!

EDIT: here is a dput of the 50 first rows of my dataset, but it's a long chunks of code:

> dput(head(AGG100,50))
structure(list(YEAR = structure(c(18L, 18L, 10L, 13L, 13L, 12L, 
12L, 18L, 15L, 18L, 18L, 14L, 11L, 12L, 13L, 16L, 16L, 17L, 17L, 
17L, 12L, 12L, 12L, 13L, 13L, 13L, 18L, 13L, 12L, 12L, 12L, 12L, 
13L, 13L, 13L, 14L, 14L, 15L, 15L, 17L, 17L, 17L, 18L, 10L, 11L, 
12L, 12L, 12L, 13L, 13L), .Label = c("1994", "1995", "1996", 
"1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", 
"2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012"
), class = "factor"), RN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("RNN027", 
"RNN037", "RNN044", "RNN046", "RNN060", "RNN066", "RNN068", "RNN078", 
"RNN079", "RNN089", "RNN130", "RNN144", "RNN157", "RNR117", "RNR189"
), class = "factor"), DATE = structure(c(15056, 15075, 12166, 
13252, 13257, 12895, 12910, 15075, 14120, 15065, 15075, 13679, 
12573, 12917, 13301, 14383, 14393, 14741, 14750, 14761, 12917, 
12925, 12930, 13279, 13286, 13295, 15106, 13286, 12917, 12925, 
12930, 12937, 13271, 13286, 13295, 13644, 13655, 14013, 14029, 
14741, 14750, 14761, 15116, 12195, 12573, 12917, 12925, 12937, 
13271, 13286), class = "Date"), NOM = structure(c(7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("ANCIENNES        CARRIERES D'ORIVAL", 
"BAIE DE L'AIGUILLON (VENDEE)", "CHERINE", "COMBE LAVAUX-JEAN ROLAND", 
"COTE DE MANCY", "Espace protégé - code", "ESTAGNOL", "GRAND PIERRE ET VITAIN", 
"ILE DE LA PLATIERE", "LAC DE REMORAY", "MARAIS DE LAVOURS", 
"PETITE CAMARGUE ALSACIENNE", "PINAIL", "RAMIERES DU VAL DE DROME", 
"RAVIN DE VALBOIS", "TOURBIERE DES DAUGES"), class = "factor"), 
SITE = structure(c(3L, 3L, 4L, 4L, 4L, 2L, 2L, 2L, 6L, 6L, 
6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Libellé du site", 
"RNN027-Estagnol 01", "RNN027-Estagnol 02", "RNN027-Estagnol 03", 
"RNN027-Estagnol 04", "RNN027-Estagnol 05", "RNN027-Estagnol 06", 
"RNN037-GPV 01", "RNN037-GPV 02", "RNN037-GPV 03", "RNN037-GPV 04", 
"RNN037-GPV 05", "RNN044-Pinail A", "RNN044-Pinail B", "RNN044-Pinail C", 
"RNN044-Pinail D", "RNN044-Pinail E", "RNN044-Pinail F", 
"RNN044-Pinail G", "RNN044-Pinail I", "RNN044-Pinail J", 
"RNN044-Pinail K", "RNN046-Remoray 01", "RNN046-Remoray 02", 
"RNN046-Remoray 03", "RNN046-Remoray 04", "RNN046-Remoray 05", 
"RNN046-Remoray 06", "RNN046-Remoray 07", "RNN046-Remoray 08", 
"RNN046-Remoray 09", "RNN046-Remoray 10", "RNN060-PCA Canal", 
"RNN060-PCA Heid", "RNN060-PCA Luzernière", "RNN060-PCA Mitlere-au", 
"RNN060-PCA Petite Heid", "RNN066-Valbois Ourlet bas", "RNN066-Valbois Ourlet haut", 
"RNN066-Valbois Pel Humbert", "RNN066-Valbois Pel Martin", 
"RNN066-Valbois Pel Podgo", "RNN066-Valbois Pel Podgo corniche", 
"RNN066-Valbois Pel temoin", "RNN066-Valbois Vignes parc2bas", 
"RNN066-Valbois Vignes parc2haut", "RNN066-Valbois Vignes parc3bas", 
"RNN066-Valbois Vignes parc3haut", "RNN068-Marais Lavours beon1", 
"RNN068-Marais Lavours beon2", "RNN068-Marais Lavours beon3", 
"RNN068-Marais Lavours beon4", "RNN068-Marais Lavours beon5", 
"RNN068-Marais Lavours ceyzerieu1", "RNN068-Marais Lavours culoz1", 
"RNN078-Chérine 01", "RNN078-Chérine 02", "RNN078-Chérine 03", 
"RNN078-Chérine 04", "RNN078-Chérine 05", "RNN078-Chérine 06", 
"RNN078-Chérine 07", "RNN078-Chérine 08", "RNN078-Chérine 09", 
"RNN078-Chérine 10", "RNN078-Chérine 11", "RNN078-Chérine 12", 
"RNN079-Platière 01LGRA01", "RNN079-Platière 01LGRA03", "RNN079-Platière 01LGRA04", 
"RNN079-Platière 01LGRA07", "RNN079-Platière 01LGRA08", "RNN079-Platière 01LGRA11", 
"RNN079-Platière 01LGRA12", "RNN079-Platière 01LPGO01", "RNN079-Platière 01LPGO02", 
"RNN079-Platière 01LPGO03", "RNN079-Platière 01LPGO04", "RNN079-Platière 01LPGO05", 
"RNN079-Platière 01LPGO06", "RNN079-Platière 01LPGO07", "RNN079-Platière 01LPOV01", 
"RNN079-Platière 01LPOV02", "RNN079-Platière 01LPOV03", "RNN079-Platière 01LPOV04", 
"RNN079-Platière 01LPOV05", "RNN079-Platière 01LPOV06", "RNN089-Ramières 24", 
"RNN089-Ramières 25", "RNN089-Ramières 26", "RNN089-Ramières 27", 
"RNN089-Ramières 35", "RNN089-Ramières 36", "RNN089-Ramières 37", 
"RNN089-Ramières 43", "RNN089-Ramières 46", "RNN089-Ramières 47", 
"RNN130-Baie Aiguillon Charron-Casiers-vase", "RNN130-Baie Aiguillon Charron-Mizottes-Chaînes", 
"RNN130-Baie Aiguillon Charron-Pd-Digue-mer", "RNN130-Baie Aiguillon Triaize-Mizottes", 
"RNN130-Baie Aiguillon Triaize-Pd-Digue-1", "RNN130-Baie Aiguillon Triaize-Pd-Digue-2", 
"RNN144-Dauges 01", "RNN144-Dauges 02", "RNN144-Dauges 03", 
"RNN144-Dauges 04", "RNN144-Dauges 05", "RNN144-Dauges 06", 
"RNN144-Dauges 07", "RNN144-Dauges 08", "RNN144-Dauges 09", 
"RNN144-Dauges 10", "RNN144-Dauges 11", "RNN144-Dauges 12", 
"RNN144-Dauges 13", "RNN144-Dauges 14", "RNN144-Dauges 15", 
"RNN157-Combe Lavaux 01 Sentier des crêtes", "RNN157-Combe Lavaux 02 Sentier des dalles", 
"RNN157-Combe Lavaux 03 Plain des Essoyottes partie humide", 
"RNN157-Combe Lavaux 04 Plain des Essoyottes partie sèches", 
"RNN157-Combe Lavaux 05 Pelouses de Fontenottes", "RNN157-Combe Lavaux 06 Friches parc de nuit", 
"RNN157-Combe Lavaux 07 Friches", "RNN157-Combe Lavaux 08 Friches ouverture ONF", 
"RNN157-Combe Lavaux 09 Haut du Champ Sement", "RNN157-Combe Lavaux 10 Bas du Champ Sement", 
"RNR117-TRMANCY 01", "RNR117-TRMANCY 02", "RNR117-TRMANCY 03", 
"RNR117-TRMANCY 04", "RNR117-TRMANCY 05", "RNR117-TRMANCY 06", 
"RNR189-RNR_TRACO_01", "RNR189-RNR_TRACO_02", "RNR189-RNR_TRACO_03", 
"RNR189-RNR_TRACO_04", "RNR189-RNR_TRACO_05", "RNR189-RNR_TRACO_06a", 
"RNR189-RNR_TRACO_06b", "RNR189-RNR_TRACO_07", "RNR189-RNR_TRACO_08", 
"RNR189-RNR_TRACO_09a", "RNR189-RNR_TRACO_09b", "RNR189-RNR_TRACO_10", 
"RNR189-RNR_TRACO_11", "RNR189-RNR_TRACO_12", "RNR189-RNR_TRACO_13", 
"RNR189-RNR_TRACO_14", "RNR189-RNR_TRACO_15"), class = "factor"), 
LONG = c(310, 310, 330, 330, 330, 540, 540, 540, 960, 960, 
960, 960, 260, 260, 260, 260, 260, 260, 260, 260, 310, 310, 
310, 310, 310, 310, 310, 330, 540, 540, 540, 540, 540, 540, 
540, 540, 540, 540, 540, 540, 540, 540, 540, 710, 710, 710, 
710, 710, 710, 710), SP = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L
), .Label = c("Aglais urticae (Linnaeus, 1758)", "Anthocharis cardamines (Linnaeus, 1758)", 
"Anthocharis euphenoides Staudinger, 1869", "Apatura ilia (Denis & Schiffermüller, 1775)", 
"Apatura iris (Linnaeus, 1758)", "Aphantopus hyperanthus (Linnaeus, 1758)", 
"Aphantopus hyperantus (Linnaeus, 1758)", "Aporia crataegi (Linnaeus, 1758)", 
"Araschnia levana (Linnaeus, 1758)", "Arethusana arethusa (Denis & Schiffermüller, 1775)", 
"Argynnis adippe (Denis & Schiffermüller, 1775)", "Argynnis aglaja (Linnaeus, 1758)", 
"Argynnis paphia (Linnaeus, 1758)", "Argynnis paphia paphia (Linnaeus, 1758)", 
"Aricia agestis (Denis & Schiffermüller, 1775)", "Aricia agestis D., 1775", 
"Boloria dia (Linnaeus, 1767)", "Boloria euphrosyne (Linnaeus, 1758)", 
"Boloria selene (Denis & Schiffermüller, 1775)", "Brenthis daphne (Bergsträsser, 1780)", 
"Brenthis ino (Rottemburg, 1775)", "Brintesia circe (Fabricius, 1775)", 
"Callophrys rubi (Linnaeus, 1758)", "Carcharodus alceae (Esper, 1780)", 
"Carcharodus floccifera (Zeller, 1847)", "Carcharodus Hübner, 1819", 
"Carcharodus lavatherae (Esper, 1783)", "Carterocephalus palaemon (Pallas, 1771)", 
"Celastrina argiolus (Linnaeus, 1758)", "Charaxes jasius (Linnaeus, 1767)", 
"Chazara briseis (Linnaeus, 1764)", "Cinclidia phoebe (Denis & Schiffermüller, 1775)", 
"Clossiana dia (Linnaeus, 1767)", "Clossiana euphrosyne (Linnaeus, 1758)", 
"Clossiana selene (Denis & Schiffermüller, 1775)", "Coenonympha arcania (Linnaeus, 1761)", 
"Coenonympha dorus (Esper, 1782)", "Coenonympha glycerion (Borkhausen, 1788)", 
"Coenonympha Hübner, 1819", "Coenonympha oedippus (Fabricius, 1787)", 
"Coenonympha pamphilus (Linnaeus, 1758)", "Coenonympha tullia (Müller, 1764)", 
"Colias alfacariensis Ribbe, 1905", "Colias crocea Geoffroy, 1785", 
"Colias croceus (Fourcroy, 1785)", "Colias Fabricius, 1807", 
"Colias hyale (Linnaeus, 1758)", "Colias palaeno (Linnaeus, 1761)", 
"Colias PC (hyale / alfacariensis) #complexe", "Cupido alcetas (Hoffmannsegg, 1804)", 
"Cupido argiades (Pallas, 1771)", "Cupido minimus (Fuessly, 1775)", 
"Cupido osiris (Meigen, 1829)", "Cyaniris semiargus (Rottemburg, 1775)", 
"Cyaniris semiargus (Rottemburg, 1775) (doublon1)", "Cynthia cardui (Linnaeus, 1758)", 
"Diacrisia sannio (Linnaeus, 1758)", "Didymaeformia didyma (Esper, 1778)", 
"Erebia aethiops (Esper, 1777)", "Erebia euryale (Esper, 1805)", 
"Erebia ligea (Linnaeus, 1758)", "Erebia medusa (Denis & Schiffermüller, 1775)", 
"Erebia meolans (de Prunner, 1798)", "Erynnis tages (Linnaeus, 1758)", 
"Euchloe crameri Butler, 1869", "Euchloe PC (ausonia / simplonia) #complexe", 
"Euclidia glyphica (Linnaeus, 1758)", "Euphydryas aurinia (Rottemburg, 1775)", 
"Euplagia quadripunctaria (Poda, 1761)", "Everes alcetas (Hoffmannsegg, 1804)", 
"Everes argiades (Pallas, 1771)", "Fabriciana adippe (Denis & Schiffermüller, 1775)", 
"Glaucopsyche alcon (Denis & Schiffermüller, 1775)", "Glaucopsyche alexis (Poda, 1761)", 
"Glaucopsyche arion (Linnaeus, 1758)", "Glaucopsyche melanops (Boisduval, 1828)", 
"Glaucopsyche nausithous (Bergsträsser, 1779)", "Glaucopsyche teleius (Bergsträsser, 1779)", 
"Gonepteryx cleopatra (Linnaeus, 1767)", "Gonepteryx rhamni (Linnaeus, 1758)", 
"Hamearis lucina (Linnaeus, 1758)", "Heodes tityrus (Poda, 1761)", 
"Hesperia comma (Linnaeus, 1758)", "Hesperia Fabricius, 1793", 
"Hesperiidae Latreille, 1809", "Heteropterus morpheus (Pallas, 1771)", 
"Hipparchia C (alcyone / genava / fagi) #complexe", "Hipparchia fagi (Scopoli, 1763)", 
"Hipparchia fidia (Linnaeus, 1767)", "Hipparchia genava (Fruhstorfer, 1908)", 
"Hipparchia semele (Linnaeus, 1758)", "Hyles euphorbiae (Linnaeus, 1758)", 
"Hyponephele lupinus (O. Costa, 1836)", "Inachis io (Linnaeus, 1758)", 
"Iphiclides podalirius (Linnaeus, 1758)", "Iphiclides podalirius (Scopoli, 1763)", 
"Issoria lathonia (Linnaeus, 1758)", "Ladoga camilla (Linnaeus, 1764)", 
"Lampides boeticus (Linnaeus, 1767)", "Lasiommata C (megera / maera) #complexe", 
"Lasiommata maera (Linnaeus, 1758)", "Lasiommata megera (Linnaeus, 1767)", 
"Leptidea Billberg, 1820", "Leptidea C (sinapis / reali) #complexe", 
"Leptidea sinapis (Linnaeus, 1758)", "Leptotes pirithous (Linnaeus, 1767)", 
"Limenitis camilla (Linnaeus, 1764)", "Limenitis Fabricius, 1807", 
"Limenitis populi (Linnaeus, 1758)", "Limenitis reducta Staudinger, 1901", 
"Lopinga achine (Scopoli, 1763)", "Lycaena alciphron (Rottemburg, 1775)", 
"Lycaena dispar (Haworth, 1802)", "Lycaena helle (Denis & Schiffermüller, 1775)", 
"Lycaena hippothoe (Linnaeus, 1761)", "Lycaena phlaeas (Linnaeus, 1761)", 
"Lycaena tityrus (Poda, 1761)", "Lycaenidae Leach, 1815", 
"Macroglossum stellatarum (Linnaeus, 1758)", "Maculinea alcon (Denis & Schiffermüller, 1775)", 
"Maculinea arion (Linnaeus, 1758)", "Maculinea Eecke, 1915", 
"Maculinea nausithous (Bergsträsser, 1779)", "Maculinea teleius (Bergsträsser, 1779)", 
"Maniola jurtina (Linnaeus, 1758)", "Melanargia arge (Sulzer, 1776)", 
"Melanargia galathea (Linnaeus, 1758)", "Melanargia lachesis (Hübner, 1790)", 
"Melanargia occitanica (Esper, 1793)", "Melitaea athalia (Rottemburg, 1775)", 
"Melitaea cinxia (Linnaeus, 1758)", "Melitaea diamina (Lang, 1789)", 
"Melitaea didyma (Esper, 1778)", "Melitaea Fabricius, 1807", 
"Melitaea parthenoides Keferstein, 1851", "Melitaea phoebe (Denis & Schiffermüller, 1775)", 
"Mellicta", "Mellicta athalia (Rottemburg, 1775)", "Mellicta C (athalia / deione / parthenoides) #complexe", 
"Mellicta parthenoides (Keferstein, 1851)", "Mesoacidalia aglaja (Linnaeus, 1758)", 
"Minois dryas (Scopoli, 1763)", "Neohipparchia statilinus (Hufnagel, 1766)", 
"Neozephyrus quercus (Linnaeus, 1758)", "Nom latin du taxon avec descripteur", 
"Nymphalis antiopa (Linnaeus, 1758)", "Nymphalis polychloros (Linnaeus, 1758)", 
"Ochlodes sylvanus (Esper, 1777)", "Ochlodes venatus (Bremer & Grey, 1853)", 
"Ochlodes venatus faunus (Turati, 1905)", "Palaeochrysophanus hippothoe Linne, 1761", 
"Papilio machaon Linnaeus, 1758", "Pararge aegeria (Linnaeus, 1758)", 
"Pieris 2 (rapae / mannii / napi) #complexe", "Pieris brassicae (Linnaeus, 1758)", 
"Pieris mannii (Mayer, 1851)", "Pieris napi (Linnaeus, 1758)", 
"Pieris napi / rapae #complexe", "Pieris PC (rapae / mannii) #complexe", 
"Pieris rapae (Linnaeus, 1758)", "Pieris Schrank, 1801", 
"Plebeius agestis (Denis & Schiffermüller, 1775)", "Plebeius argus (Linnaeus, 1758)", 
"Plebeius argyrognomon (Bergsträsser, 1779)", "Plebejus argus (Linnaeus, 1758)", 
"Plebejus argyrognomon (Bergsträsser, 1779)", "Plebejus idas (Linnaeus, 1761)", 
"Plebejus Kluk, 1802", "Polygonia c-album (Linnaeus, 1758)", 
"Polyommatus bellargus (Rottemburg, 1775)", "Polyommatus C (coridon / hispana) #complexe", 
"Polyommatus coridon (Poda, 1761)", "Polyommatus escheri (Hübner, 1823)", 
"Polyommatus icarus (Rottemburg, 1775)", "Polyommatus Latreille, 1804", 
"Polyommatus semiargus (Rottemburg, 1775)", "Polyommatus thersites (Cantener, 1835)", 
"Pontia daplidice (Linnaeus, 1758)", "Pseudopanthera macularia (Linnaeus, 1758)", 
"Pseudophilotes baton (Bergsträsser, 1779)", "Pseudotergumia fidia (Linnaeus, 1767)", 
"Pyrgus 1 C (malvae / malvoides) #complexe", "Pyrgus 2 C (armoricanus / foulquieri / alveus / onopordi) #complexe", 
"Pyrgus 3 C (serratulae / carlinae / cirsii) #complexe", 
"Pyrgus armoricanus (Oberthür, 1910)", "Pyrgus Hübner, 1819", 
"Pyrgus malvae (Linnaeus, 1758)", "Pyrgus serratulae (Rambur, 1839)", 
"Pyronia bathseba (Fabricius, 1793)", "Pyronia cecilia (Vallantin, 1894)", 
"Pyronia tithonus (Linnaeus, 1767)", "Pyronia tithonus (Linnaeus, 1771)", 
"Quercusia quercus (Linnaeus, 1758)", "Satyridae", "Satyrium acaciae (Fabricius, 1787)", 
"Satyrium esculi (Hübner, 1804)", "Satyrium ilicis (Esper, 1779)", 
"Satyrium pruni (Linnaeus, 1758)", "Satyrium spini (Denis & Schiffermüller, 1775)", 
"Satyrium w-album (Knoch, 1782)", "Smerinthus ocellatus (Linnaeus, 1758)", 
"Speyeria aglaja (Linnaeus, 1758)", "Spialia sertorius (Hoffmannsegg, 1804)", 
"Thecla betulae (Linnaeus, 1758)", "Thymelicus acteon (Rottemburg, 1775)", 
"Thymelicus acteon / lineolus / sylvestris #complexe", "Thymelicus C (sylvestris / lineolus) #complexe", 
"Thymelicus Hübner, 1819", "Thymelicus lineola (Ochsenheimer, 1808)", 
"Thymelicus sylvestris (Poda, 1761)", "Vanessa atalanta (Linnaeus, 1758)", 
"Vanessa cardui (Linnaeus, 1758)", "Zerynthia polyxena (Denis & Schiffermüller, 1775)", 
"Zygaena carniolica (Scopoli, 1763)", "Zygaena ephialtes (Linnaeus, 1767)", 
"Zygaena erythrus (Hübner, 1806)", "Zygaena Fabricius, 1775", 
"Zygaena fausta (Linnaeus, 1767)", "Zygaena filipendulae (Linnaeus, 1758)", 
"Zygaena hilaris Ochsenheimer, 1808", "Zygaena lonicerae (Scheven, 1777)", 
"Zygaena loti (Denis & Schiffermüller, 1775)", "Zygaena occitanica (Villers, 1789)", 
"Zygaena purpuralis (Brünnich, 1763)", "Zygaena sarpedon (Hübner, 1790)", 
"Zygaena transalpina (Esper, 1780)", "Zygaena trifolii (Esper, 1783)", 
"Zygaena viciae (Denis & Schiffermüller, 1775)"), class = "factor"), 
SUMNB = c(1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 
2, 1, 2, 1, 6, 6, 4, 4, 6, 2, 1, 6, 10, 6, 12, 2, 6, 10, 
2, 4, 2, 2, 2, 2, 11, 1, 1, 2, 2, 6, 2, 2, 4, 2), NB100 = c(0.32258064516129, 
0.32258064516129, 0.606060606060606, 0.606060606060606, 0.606060606060606, 
0.37037037037037, 0.37037037037037, 0.185185185185185, 0.208333333333333, 
0.104166666666667, 0.104166666666667, 0.208333333333333, 
0.769230769230769, 0.769230769230769, 0.769230769230769, 
0.769230769230769, 0.769230769230769, 0.384615384615385, 
0.769230769230769, 0.384615384615385, 1.93548387096774, 1.93548387096774, 
1.29032258064516, 1.29032258064516, 1.93548387096774, 0.645161290322581, 
0.32258064516129, 1.81818181818182, 1.85185185185185, 1.11111111111111, 
2.22222222222222, 0.37037037037037, 1.11111111111111, 1.85185185185185, 
0.37037037037037, 0.740740740740741, 0.37037037037037, 0.37037037037037, 
0.37037037037037, 0.37037037037037, 2.03703703703704, 0.185185185185185, 
0.185185185185185, 0.28169014084507, 0.28169014084507, 0.845070422535211, 
0.28169014084507, 0.28169014084507, 0.563380281690141, 0.28169014084507
)), .Names = c("YEAR", "RN", "DATE", "NOM", "SITE", "LONG", 
"SP", "SUMNB", "NB100"), row.names = c(NA, 50L), class = "data.frame")
user2542995
  • 241
  • 2
  • 4
  • 11
  • 3
    Please use `dput` to provide a reproducible example. – Sven Hohenstein Jan 07 '14 at 11:45
  • 5
    You have very specific ideas about what you want to do. However, maybe it would be better to step back and let others choose a different (more efficient) way to your actual goal (the abundance index). I suggest to rework your question accordingly. If you do so, you should follow @SvenHohenstein's advice. See the [FAQ](http://stackoverflow.com/a/5963610/1412059) for more information about reproducible examples. – Roland Jan 07 '14 at 11:52
  • @Roland: I just provide some ideas about how i would proceed, not necessarly the steps to follow. Since my question is a bit hard to understand, i thought these steps were not useless. – user2542995 Jan 07 '14 at 13:55
  • But you don't tell us, how the abundance index is calculated. Just some stuff about `for` loops, which most experienced R users don't use for tasks like this. Instead the [split-apply-combine approach](http://www.jstatsoft.org/v40/i01/paper) (which is implemented in quite a few different functions and packages) should probably be used for your problem. – Roland Jan 07 '14 at 14:59
  • @Roland: I didnt explained how calculate the index because i was still on formatting step. Since the index use time differences, i had to add dummy data before and after observations in order to not underestimate the index. Calculate the index is a whole another problem. Now thanks to BrodieG, i can work on the formula rather than the data manipulation. Thanks anyway! – user2542995 Jan 08 '14 at 11:30

1 Answers1

2

data.table is well suited for this type of analysis. The syntax gets some getting used to, so if you're not familiar with it the following code may be hard to understand, but hopefully the result is compelling enough you are interested in learning more about data.table. Note that while the following looks like a lot of code, most of it is just explanatory comments. Most of the work is done in four lines of code, which could have been condensed further at the cost of clarity:

library(data.table)
dt <- data.table(df)  # convert to data table

# step1, get min/max dates for YEAR/SITE

dt.dates <- dt[, list(DATE=range(DATE)), by=list(YEAR, SITE)]  

# Get unique set of YEAR/SITE/SP (.SD is a special data table variable that
# refers to all columns that are not otherwise referenced in the statement,
# the `by` argument essentially means split and group by YEAR, SITE, SP),
# so here I'm taking the first row of every column except `DATE` for each
# YEAR/SITE/SP combination)

dt.site.sp <- dt[, .SD[1, names(.SD)[names(.SD) != "DATE"], with=F], by=list(YEAR, SITE, SP)] 

# Join min/max dates for YEAR/SITE to YEAR/SITE/SP unique list; this will
# cause every unique YEAR/SITE/SP value to be matched to the min/max dates
# for every YEAR/SITE; note that in your data some YEAR/SITE only have one
# date.

setkey(dt.dates, YEAR, SITE) # this allows the join   
dt.final <- dt.dates[dt.site.sp, allow.cartesian=T]

# now cleanup

dt.final <- dt.final[, names(dt), with=F][order(YEAR, SITE, SP)]  # reorder columns and rows
dt.final[, SUMNB:=0]      # set value column to 0
dt.final[, NB100:=0]      # set value column to 0
dt.final[YEAR==2005]

Which produces:

#     YEAR     RN       DATE      NOM               SITE LONG                                      SP SUMNB NB100
#  1: 2005 RNN027 2005-04-22 ESTAGNOL RNN027-Estagnol 01  540 Anthocharis cardamines (Linnaeus, 1758)     0     0
#  2: 2005 RNN027 2005-06-03 ESTAGNOL RNN027-Estagnol 01  540 Anthocharis cardamines (Linnaeus, 1758)     0     0
#  3: 2005 RNN027 2005-04-22 ESTAGNOL RNN027-Estagnol 01  540        Aporia crataegi (Linnaeus, 1758)     0     0
#  4: 2005 RNN027 2005-06-03 ESTAGNOL RNN027-Estagnol 01  540        Aporia crataegi (Linnaeus, 1758)     0     0
#  5: 2005 RNN027 2005-05-14 ESTAGNOL RNN027-Estagnol 02  310        Aporia crataegi (Linnaeus, 1758)     0     0
#  6: 2005 RNN027 2005-05-27 ESTAGNOL RNN027-Estagnol 02  310        Aporia crataegi (Linnaeus, 1758)     0     0
#  7: 2005 RNN027 2005-05-14 ESTAGNOL RNN027-Estagnol 04  710        Aporia crataegi (Linnaeus, 1758)     0     0
#  8: 2005 RNN027 2005-06-03 ESTAGNOL RNN027-Estagnol 04  710        Aporia crataegi (Linnaeus, 1758)     0     0
#  9: 2005 RNN027 2005-05-14 ESTAGNOL RNN027-Estagnol 06  260        Aporia crataegi (Linnaeus, 1758)     0     0
# 10: 2005 RNN027 2005-05-14 ESTAGNOL RNN027-Estagnol 06  260        Aporia crataegi (Linnaeus, 1758)     0     0
BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • It seems to be working perfectly! I had some doubts about the SP list but after comparing the output with my raw data, it's ok. Thanks a lot! One last thing i forget is that i need to modify the min and max dates: min - 7 and max +7 days. Since you are using range(), i'm not quite sure about how i can do that. Thanks anyway! – user2542995 Jan 08 '14 at 11:24
  • You should be able to replace the `dt.dates` step with: `dt.dates <- dt[, list(DATE=c(min(DATE) - 7, max(DATE) + 7)), by=list(YEAR, SITE)]`. This basically breaks your data into YEAR/SITE groups, and for each group, creates a two length vector containing min(DATE)-7 and max(DATE)+7, which then gets reassembled into a data table with two rows per group by adding back the grouping columns. – BrodieG Jan 08 '14 at 13:44
  • I didnt saw i had an error in dt.final <- [, names(dt), with=F][order(YEAR, SITE, SP)] Error: unexpected '[' in "dt.final <- [". It's strange because it produces the right table anyway. – user2542995 Jan 08 '14 at 14:37
  • Sorry, was just missing a `dt.final` ahead of the `[`. This worked because that step was just reordering stuff, so all the hard work was done already. Fixed it in the answer. – BrodieG Jan 08 '14 at 14:45