0

I have been learning r for a few days, and I am trying to do a specific output and I can not figure out how to select days prior to an event.

I am trying to determine how recharge events impact detections of a containment in water samples. I have data that has 7 columns (Date, month, day, year, samp, prcip, snow) where the date is YYYY/MM/DD, month, day, and year are what they say, samp has a 0, 1, or NA, and both precip and snow have daily totals for either rain or snow. There is 1 row for each day.

I want to explore rain events for different number of days prior to a sampling event. I would like to select the rows with either a 0 (sample and no detection) or 1 (sample with detection) and then select a number of days prior, say 5 for this example, and compare the mean, sum, etc of those prior days between the 0 and 1 groups.

I have found ways to select just the rows, and to get a count of how many consecutive days have either a 0 or a 1 (How to subset consecutive rows if they meet a condition), but I can not figure out how to select a number of days prior to a sampling event and create a new table.

I have over 800 sampling days spread over 20+ years, so I do not want to input each date either (Subset dataframe where date is within x days of a vector of dates in R).

I have tried using the %>% pipe, and some other selection methods and have been able to select the rows containing 0 or 1, but I don't understand how to also get the days prior to the sampling day. I am looking for any direction, suggestions, or functions/tools/packages to look up as I have come up short on new avenues to explore.

I would like to select my data so I can eventually run some simple statistics on it. This is an exploratory project for me - to learn r and practice statistics. I would like to run t-tests, and ANOVA, look at different days prior to sampling. I am selecting sample days so I can eventually ask a question such as "how does rainfall 5 days prior to a sampling event impact detections" or "how is the mean of rain 5 days prior to a positive detection different from 5 days before a negative detection". Hopefully giving you context of my purpose helps me to explain myself and what I am looking for.

How my data looks now:

date    month   day year    samp    precip  snow

11/11/1988 11 11 1988 NA 0 0 11/12/1988 11 12 1988 NA 0 0 11/13/1988 11 13 1988 NA 0.55 0 11/14/1988 11 14 1988 NA 0 0 11/15/1988 11 15 1988 NA 0 0 11/16/1988 11 16 1988 NA 0.52 0 11/17/1988 11 17 1988 NA 0 0 11/18/1988 11 18 1988 NA 0 0 11/19/1988 11 19 1988 NA 0 0 11/20/1988 11 20 1988 NA 0.39 0 11/21/1988 11 21 1988 NA 0.43 0 11/22/1988 11 22 1988 NA 0 0 11/23/1988 11 23 1988 NA 0 0 11/24/1988 11 24 1988 NA 0 0 11/25/1988 11 25 1988 NA 0 0 11/26/1988 11 26 1988 NA 0.11 0 11/27/1988 11 27 1988 NA 0.08 0 11/28/1988 11 28 1988 NA 0.01 0 11/29/1988 11 29 1988 NA 0 0 11/30/1988 11 30 1988 NA 0 0 12/1/1988 12 1 1988 NA 0 0 12/2/1988 12 2 1988 NA 0 0 12/3/1988 12 3 1988 NA 0 0 12/4/1988 12 4 1988 NA 0 0 12/5/1988 12 5 1988 NA 0 0 12/6/1988 12 6 1988 NA 0 0 12/7/1988 12 7 1988 NA 0 0 12/8/1988 12 8 1988 NA 0 0 12/9/1988 12 9 1988 NA 0 0 12/10/1988 12 10 1988 NA 0 0 12/11/1988 12 11 1988 NA 0 0 12/12/1988 12 12 1988 NA 0 0 12/13/1988 12 13 1988 NA 0.03 1 12/14/1988 12 14 1988 NA 0 0 12/15/1988 12 15 1988 NA 0 0 12/16/1988 12 16 1988 NA 0 0 12/17/1988 12 17 1988 NA 0 2 12/18/1988 12 18 1988 NA 0 0 12/19/1988 12 19 1988 NA 0 0 12/20/1988 12 20 1988 NA 0.07 0 12/21/1988 12 21 1988 NA 0.02 0 12/22/1988 12 22 1988 NA 0 0 12/23/1988 12 23 1988 NA 1.3 0 12/24/1988 12 24 1988 NA 0 0 12/25/1988 12 25 1988 NA 0 0 12/26/1988 12 26 1988 NA 0 0 12/27/1988 12 27 1988 NA 0.85 3 12/28/1988 12 28 1988 NA 0.37 3 12/29/1988 12 29 1988 NA 0 0 12/30/1988 12 30 1988 NA 0 0 12/31/1988 12 31 1988 NA 0 0 1/1/1989 1 1 1989 NA 0 0 1/2/1989 1 2 1989 NA 0 0 1/3/1989 1 3 1989 NA 0 0 1/4/1989 1 4 1989 NA 0 0 1/5/1989 1 5 1989 NA 0 0 1/6/1989 1 6 1989 NA 0.54 0 1/7/1989 1 7 1989 NA 0 0 1/8/1989 1 8 1989 NA 0.08 0 1/9/1989 1 9 1989 NA 0 0 1/10/1989 1 10 1989 NA 0 0 1/11/1989 1 11 1989 NA 0 0 1/12/1989 1 12 1989 NA 0 0 1/13/1989 1 13 1989 NA 0 0 1/14/1989 1 14 1989 NA 0 0 1/15/1989 1 15 1989 NA 0.04 1 1/16/1989 1 16 1989 NA 0 0 1/17/1989 1 17 1989 NA 0 0 1/18/1989 1 18 1989 NA 0 0 1/19/1989 1 19 1989 NA 0 0 1/20/1989 1 20 1989 NA 0 0 1/21/1989 1 21 1989 NA 0 0 1/22/1989 1 22 1989 NA 0 0 1/23/1989 1 23 1989 NA 0 0 1/24/1989 1 24 1989 NA 0 0 1/25/1989 1 25 1989 NA 0 0 1/26/1989 1 26 1989 NA 0.15 0 1/27/1989 1 27 1989 NA 0 0 1/28/1989 1 28 1989 NA 0 0 1/29/1989 1 29 1989 NA 0 0 1/30/1989 1 30 1989 NA 0 0 1/31/1989 1 31 1989 NA 0 0 2/1/1989 2 1 1989 NA 0 0 2/2/1989 2 2 1989 NA 0 0 2/3/1989 2 3 1989 NA 0.01 0 2/4/1989 2 4 1989 NA 0 0 2/5/1989 2 5 1989 NA 0.28 4 2/6/1989 2 6 1989 NA 0.21 3 2/7/1989 2 7 1989 NA 0 0 2/8/1989 2 8 1989 NA 0 0 2/9/1989 2 9 1989 NA 0 0 2/10/1989 2 10 1989 NA 0 0 2/11/1989 2 11 1989 NA 0 0 2/12/1989 2 12 1989 NA 0 0 2/13/1989 2 13 1989 NA 0.26 1 2/14/1989 2 14 1989 NA 0 0 2/15/1989 2 15 1989 NA 0.04 0 2/16/1989 2 16 1989 NA 0.03 1 2/17/1989 2 17 1989 NA 0 0 2/18/1989 2 18 1989 NA 0 0 2/19/1989 2 19 1989 NA 0 0 2/20/1989 2 20 1989 NA 0 0 2/21/1989 2 21 1989 NA 0.21 2 2/22/1989 2 22 1989 NA 0 0 2/23/1989 2 23 1989 NA 0 0 2/24/1989 2 24 1989 NA 0 0 2/25/1989 2 25 1989 NA 0 0 2/26/1989 2 26 1989 NA 0 0 2/27/1989 2 27 1989 NA 0 0 2/28/1989 2 28 1989 NA 0 0 3/1/1989 3 1 1989 1 0 0 3/2/1989 3 2 1989 NA 0 0 3/3/1989 3 3 1989 NA 0 0 3/4/1989 3 4 1989 NA 0 0 3/5/1989 3 5 1989 NA 0.34 0 3/6/1989 3 6 1989 NA 0 0 3/7/1989 3 7 1989 NA 0 0 3/8/1989 3 8 1989 NA 0 0 3/9/1989 3 9 1989 NA 0 0 3/10/1989 3 10 1989 NA 0 0 3/11/1989 3 11 1989 NA 0 0 3/12/1989 3 12 1989 NA 0 0 3/13/1989 3 13 1989 NA 0 0 3/14/1989 3 14 1989 NA 0 0 3/15/1989 3 15 1989 NA 0 0 3/16/1989 3 16 1989 NA 0 0 3/17/1989 3 17 1989 NA 0 0 3/18/1989 3 18 1989 NA 0.02 0 3/19/1989 3 19 1989 NA 0 0 3/20/1989 3 20 1989 NA 0 0 3/21/1989 3 21 1989 NA 0 0 3/22/1989 3 22 1989 NA 0 0 3/23/1989 3 23 1989 NA 0 0 3/24/1989 3 24 1989 NA 0 0 3/25/1989 3 25 1989 NA 0 0 3/26/1989 3 26 1989 NA 0 0 3/27/1989 3 27 1989 NA 0 0 3/28/1989 3 28 1989 NA 0.02 0 3/29/1989 3 29 1989 NA 0.81 0 3/30/1989 3 30 1989 NA 0 0 3/31/1989 3 31 1989 NA 0 0 4/1/1989 4 1 1989 NA 0 0 4/2/1989 4 2 1989 NA 0.05 0 4/3/1989 4 3 1989 NA 0.81 0 4/4/1989 4 4 1989 NA 0.49 0 4/5/1989 4 5 1989 NA 0 0 4/6/1989 4 6 1989 NA 0 0 4/7/1989 4 7 1989 NA 0 0 4/8/1989 4 8 1989 NA 0 0 4/9/1989 4 9 1989 NA 0.26 0 4/10/1989 4 10 1989 NA 0 0 4/11/1989 4 11 1989 NA 0 0 4/12/1989 4 12 1989 NA 0 0 4/13/1989 4 13 1989 NA 0 0 4/14/1989 4 14 1989 NA 0 0 4/15/1989 4 15 1989 NA 0 0 4/16/1989 4 16 1989 NA 0 0 4/17/1989 4 17 1989 NA 0.27 0 4/18/1989 4 18 1989 NA 0.04 0 4/19/1989 4 19 1989 NA 0 0 4/20/1989 4 20 1989 NA 0 0 4/21/1989 4 21 1989 NA 0 0 4/22/1989 4 22 1989 NA 0 0 4/23/1989 4 23 1989 NA 0 0 4/24/1989 4 24 1989 NA 0 0 4/25/1989 4 25 1989 NA 0 0 4/26/1989 4 26 1989 NA 0 0 4/27/1989 4 27 1989 NA 0.23 0 4/28/1989 4 28 1989 NA 0.28 0 4/29/1989 4 29 1989 NA 0 0 4/30/1989 4 30 1989 NA 0 0 5/1/1989 5 1 1989 NA 0 0 5/2/1989 5 2 1989 NA 0 0 5/3/1989 5 3 1989 0 0.28 0 5/4/1989 5 4 1989 NA 0 0 5/5/1989 5 5 1989 NA 0.06 0 5/6/1989 5 6 1989 NA 0 0 5/7/1989 5 7 1989 NA 0 0 5/8/1989 5 8 1989 NA 0 0 5/9/1989 5 9 1989 NA 0.42 0 5/10/1989 5 10 1989 NA 0.02 0 5/11/1989 5 11 1989 NA 0 0 5/12/1989 5 12 1989 NA 0 0 5/13/1989 5 13 1989 NA 0 0 5/14/1989 5 14 1989 NA 0 0 5/15/1989 5 15 1989 NA 0 0 5/16/1989 5 16 1989 NA 0 0 5/17/1989 5 17 1989 NA 0 0 5/18/1989 5 18 1989 NA 0 0 5/19/1989 5 19 1989 NA 0.05 0 5/20/1989 5 20 1989 NA 1.17 0 5/21/1989 5 21 1989 NA 0 0 5/22/1989 5 22 1989 NA 0 0 5/23/1989 5 23 1989 NA 0.03 0 5/24/1989 5 24 1989 NA 0 0 5/25/1989 5 25 1989 NA 0.21 0 5/26/1989 5 26 1989 NA 0.37 0 5/27/1989 5 27 1989 NA 0 0 5/28/1989 5 28 1989 NA 0 0 5/29/1989 5 29 1989 NA 0 0 5/30/1989 5 30 1989 NA 1.5 0 5/31/1989 5 31 1989 NA 0.14 0 6/1/1989 6 1 1989 1 0.97 0 6/2/1989 6 2 1989 NA 1.04 0 6/3/1989 6 3 1989 NA 0 0 6/4/1989 6 4 1989 NA 0.25 0 6/5/1989 6 5 1989 NA 0 0 6/6/1989 6 6 1989 NA 0 0 6/7/1989 6 7 1989 NA 0 0 6/8/1989 6 8 1989 NA 0 0 6/9/1989 6 9 1989 NA 0 0 6/10/1989 6 10 1989 NA 0 0 6/11/1989 6 11 1989 NA 0 0 6/12/1989 6 12 1989 NA 0.32 0 6/13/1989 6 13 1989 NA 0.16 0 6/14/1989 6 14 1989 NA 0 0

How I would like my data to look once it is completed:

date month day year samp precip snow 2/24/1989 2 24 1989 NA 0 0 2/25/1989 2 25 1989 NA 0 0 2/26/1989 2 26 1989 NA 0 0 2/27/1989 2 27 1989 NA 0 0 2/28/1989 2 28 1989 NA 0 0 3/1/1989 3 1 1989 1 0 0 4/28/1989 4 28 1989 NA 0.28 0 4/29/1989 4 29 1989 NA 0 0 4/30/1989 4 30 1989 NA 0 0 5/1/1989 5 1 1989 NA 0 0 5/2/1989 5 2 1989 NA 0 0 5/3/1989 5 3 1989 0 0.28 0 5/27/1989 5 27 1989 NA 0 0 5/28/1989 5 28 1989 NA 0 0 5/29/1989 5 29 1989 NA 0 0 5/30/1989 5 30 1989 NA 1.5 0 5/31/1989 5 31 1989 NA 0.14 0 6/1/1989 6 1 1989 1 0.97 0

Mary Moses
  • 13
  • 3
  • 5
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 12 '18 at 18:45
  • 1
    Thank you, I would love to have done that, and attempted to do so but I could not get tables to work. I feel so out of my element with all of this but I really am trying. Thank you for your feedback and I will figure out how to provide that and do so later. I was anxious to see if anyone had some quick direction and so gave up on the tables in the question for the moment as I was getting overwhelmed. – Mary Moses Mar 12 '18 at 21:12

1 Answers1

1

Here is a solution using which(!is.na()) and purrr::map. These are good sources to learn more about the tidyverse and purrr.

library(tidyverse)

str(dat)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    216 obs. of  7 variables:
#>  $ date  : chr  "11/11/1988" "11/12/1988" "11/13/1988" "11/14/1988" ...
#>  $ month : int  11 11 11 11 11 11 11 11 11 11 ...
#>  $ day   : int  11 12 13 14 15 16 17 18 19 20 ...
#>  $ year  : int  1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 ...
#>  $ samp  : int  NA NA NA NA NA NA NA NA NA NA ...
#>  $ precip: num  0 0 0.55 0 0 0.52 0 0 0 0.39 ...
#>  $ snow  : int  0 0 0 0 0 0 0 0 0 0 ...

# Extra: convert date from character to date format
dat <- dat %>% 
  mutate(date = as.Date(date, "%m/%d/%Y"))

Find the row locations at which samp is not NA

idx <- which(!is.na(dat$samp))
idx
#> [1] 111 174 203

Next we loop through these row indices then extract values 5 days prior to them

idx %>% 
  map(. , function(x) dat[(x-5):(x), ])

#> [[1]]
#> # A tibble: 6 x 7
#>   date       month   day  year  samp precip  snow
#>   <date>     <int> <int> <int> <int>  <dbl> <int>
#> 1 1989-02-24     2    24  1989    NA     0.     0
#> 2 1989-02-25     2    25  1989    NA     0.     0
#> 3 1989-02-26     2    26  1989    NA     0.     0
#> 4 1989-02-27     2    27  1989    NA     0.     0
#> 5 1989-02-28     2    28  1989    NA     0.     0
#> 6 1989-03-01     3     1  1989     1     0.     0
#> 
#> [[2]]
#> # A tibble: 6 x 7
#>   date       month   day  year  samp precip  snow
#>   <date>     <int> <int> <int> <int>  <dbl> <int>
#> 1 1989-04-28     4    28  1989    NA  0.280     0
#> 2 1989-04-29     4    29  1989    NA  0.        0
#> 3 1989-04-30     4    30  1989    NA  0.        0
#> 4 1989-05-01     5     1  1989    NA  0.        0
#> 5 1989-05-02     5     2  1989    NA  0.        0
#> 6 1989-05-03     5     3  1989     0  0.280     0
#> 
#> [[3]]
#> # A tibble: 6 x 7
#>   date       month   day  year  samp precip  snow
#>   <date>     <int> <int> <int> <int>  <dbl> <int>
#> 1 1989-05-27     5    27  1989    NA  0.        0
#> 2 1989-05-28     5    28  1989    NA  0.        0
#> 3 1989-05-29     5    29  1989    NA  0.        0
#> 4 1989-05-30     5    30  1989    NA  1.50      0
#> 5 1989-05-31     5    31  1989    NA  0.140     0
#> 6 1989-06-01     6     1  1989     1  0.970     0

If we want the result in a data frame

idx %>% 
  map_df(. , function(x) dat[(x-5):(x), ])

#> # A tibble: 18 x 7
#>    date       month   day  year  samp precip  snow
#>    <date>     <int> <int> <int> <int>  <dbl> <int>
#>  1 1989-02-24     2    24  1989    NA  0.        0
#>  2 1989-02-25     2    25  1989    NA  0.        0
#>  3 1989-02-26     2    26  1989    NA  0.        0
#>  4 1989-02-27     2    27  1989    NA  0.        0
#>  5 1989-02-28     2    28  1989    NA  0.        0
#>  6 1989-03-01     3     1  1989     1  0.        0
#>  7 1989-04-28     4    28  1989    NA  0.280     0
#>  8 1989-04-29     4    29  1989    NA  0.        0
#>  9 1989-04-30     4    30  1989    NA  0.        0
#> 10 1989-05-01     5     1  1989    NA  0.        0
#> 11 1989-05-02     5     2  1989    NA  0.        0
#> 12 1989-05-03     5     3  1989     0  0.280     0
#> 13 1989-05-27     5    27  1989    NA  0.        0
#> 14 1989-05-28     5    28  1989    NA  0.        0
#> 15 1989-05-29     5    29  1989    NA  0.        0
#> 16 1989-05-30     5    30  1989    NA  1.50      0
#> 17 1989-05-31     5    31  1989    NA  0.140     0
#> 18 1989-06-01     6     1  1989     1  0.970     0

More compacted form where you replace function(x) & x with "~" & "."

idx %>% 
  map_df(~ dat[(. -5):(.), ])

Created on 2018-03-12 by the reprex package (v0.2.0).

Tung
  • 26,371
  • 7
  • 91
  • 115
  • 1
    Thank you so much! It can be intimidating to post questions on here and feedback can be rude and belittling. Your comment was extremely helpful, I am going to study many of the things you mentioned and it is a great starting point for my statistical analysis. The break out of the sampling dates and associated days prior is exactly what I was thinking of but did not how to articulate in R. Thank you for your time and clear explanation of your solution. – Mary Moses Mar 13 '18 at 15:20
  • You're welcome! I'm sorry you felt that way about SO. I can understand other posters as there are hundreds of posts everyday asking for help without showing any effort or providing any reproducible data. From my own experience, you need to do your homework first to get the most out of SO. There are many knowledgeable brilliant people here. Anyway I recommend you to spend good amount of time going through these: [R for data science](http://r4ds.had.co.nz/) and [R for Earth-System Science](http://geog.uoregon.edu/bartlein/courses/geog490/index.html). Cheers! – Tung Mar 13 '18 at 16:10