0

I'm using satellite data to determine Net Primary Production (NPP) for over 100 sample locations. For every location, I need to obtain NPP values for every month (January- December) for a ten-year span (2007-2017). I need to find a way to automate this with code.

This is the structure of my data:

'''

structure(list(Month = c("January-", "January-", "January-", 
"January-", "January-"), long = c(-179.916672, -179.75, -179.583328, 
-179.416672, -179.25), lat = c(39.916668, 39.916668, 39.916668, 
39.916668, 39.916668), npp = c(297.813, 304.971, 292.946, 296.196, 
285.804)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))

'''

The coordinates for the first sample are 14.58, 168.03 and there is an exact match for every month between January and December. I need to find these values, but the dataset is very large. If anyone could help me in anyway to help automate this process, I would be so grateful.

Gina
  • 57
  • 6
  • Not sure I understand - but are you looking for `dat[dat$long == 14.58 & dat$lat == 168.03, ]`? – zephryl Dec 03 '22 at 05:49
  • 1
    By seeing those numbers I doubt *"there is an exact match"*. [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) and [Why are these numbers not equal?](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) may be relevant. – Rui Barradas Dec 03 '22 at 06:53
  • So the first 5 locations are on the exactly same ˋlatˋ but slightly different ˋ longˋ ? – MarBlo Dec 03 '22 at 08:08

1 Answers1

0

For what I understand, your example data is insufficient. I therefor have created a DF with 3 different example locations and corresponding random lat and long. I have created 1000 random dates in the timeframe mentioned and 1000 random app - see below. (1000 for avoiding too many NAs in the table below) This DF assumes that each location delivers app-values at the same day.

After making some shortcuts for year and month data are summarized app by location and shown in a wider format. That is my understanding of every month in ten-year span

library(lubridate)
library(tidyverse)

df |> 
  mutate(m = month(date, label = T)) |> 
  mutate(y = year(date)) |> 
  group_by(y, m, location) |> 
  summarise(sum= sum(npp)) |> 
  pivot_wider(names_from = m, values_from = sum)
#> `summarise()` has grouped output by 'y', 'm'. You can override using the
#> `.groups` argument.
#> # A tibble: 33 × 14
#> # Groups:   y [11]
#>        y location   Jan   Feb   Mär   Apr   Mai   Jun   Jul   Aug   Sep   Okt
#>    <dbl> <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2007 A        2544. 1260. 2303. 1665. 1952. 2440. 2842. 2323. 1744. 2827.
#>  2  2007 B        2412. 1473. 2126. 1484. 1953. 2726. 3251. 2249. 1924. 2598.
#>  3  2007 C        2212. 1460. 2233. 1816. 2085. 2604. 2871. 1996. 1960. 2714.
#>  4  2008 A        2397. 2141. 2352. 3375. 2045. 1757. 1476. 2813. 3169. 3593.
#>  5  2008 B        2562. 2314. 2299. 3634. 1879. 1544. 1568. 2805. 3101. 3712.
#>  6  2008 C        2487. 2269. 2159. 3740. 1727. 1631. 1462. 3048. 2872. 3742.
#>  7  2009 A        1538. 1241. 2434. 1916. 2757. 1937. 1720. 1335. 2600. 2809.
#>  8  2009 B        1643. 1312. 2410. 2170. 2817. 1973. 1566. 1253. 2720. 2758.
#>  9  2009 C        1549. 1231. 2331. 2490. 2766. 1886. 1472. 1354. 2810. 2727.
#> 10  2010 A        2732. 2463. 1220.  846. 2538. 4352.  948. 3826. 3062. 2423.
#> # … with 23 more rows, and 2 more variables: Nov <dbl>, Dez <dbl>

Data

set.seed(123)

# make 1000 dates
date <- sample(seq(as_date('2007/01/01'), as_date('2017/01/01'), by="day"), 1000)

location <- rep(LETTERS[1:3], 1000) # 3 locations
long <- rep(runif(3, -180, -120), 1000) # whith 3 long`s`
lat <- rep(runif(3, 30, 40), 1000) # and with 3 lat`s`
npp <- runif(1000, 200, 350) # make 1000 npp`s`

# make DF with repetition of 3 for each location
df <- data.frame(location,date = rep(date, each =3), long, lat, date, npp)
MarBlo
  • 4,195
  • 1
  • 13
  • 27