1

Here is how my data looks like:

tibble [1,702,551 x 4] (S3: tbl_df/tbl/data.frame)
$ date   : Date[1:1702551], format: "2011-04-12" "2011-04-12" ...
$ wlength: num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
$ ID     : chr [1:1702551] "c01" "c01" "c01" "c01" ...
$ R      : num [1:1702551] 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 ...

head(fdata)
A tibble: 6 x 4
date       wlength ID        R
<date>       <dbl> <chr> <dbl>
1 2011-04-12     350 c01   0.009
2 2011-04-12     351 c01   0.009
3 2011-04-12     352 c01   0.009
4 2011-04-12     353 c01   0.009
5 2011-04-12     354 c01   0.009
6 2011-04-12     355 c01   0.009

Quick explanation of the data: During 9 years, data was collected through the years (date) on the reflectance (wavelength) of different kinds of vegetation (ID), for instante, "c01", "h07"... and the value associated is (R).

I want to apply this equation of the Normalized Difference Vegetation Index (NDVI):

(R800-R670)/(R800+R670)

The numbers in front of R are the wavelenghts(wlength). Basically for each "date" and each "ID" I want to extract the value of R when the wavelength is equal to 800 and 670 and apply the equation.

How can I adress all this variables in order to apply this equation to my data?

Any help would be much appreciatted. Thank you.

Cláudio Siva
  • 502
  • 2
  • 10
  • How do you calculate R? Does the result (equation) need to be a vector or a scalar? – Eyayaw Jan 18 '21 at 15:36
  • R is the average reflectance by the number of repetitions of a specific vegetation code, for instance, h01 at a specific wavelength. The result needs to be scalar. – Cláudio Siva Jan 18 '21 at 16:26

3 Answers3

1

Here's a possibility using the tidyverse:

library(tidyverse)

fdata <-
  tribble(
          ~date , ~wlength , ~ID , ~R,
          "2011-04-12", 354 , "c01" , 0.022 ,
          "2011-04-12", 800 , "c01" , 0.014,
          "2011-04-12", 670 , "c01" , 0.009,
          "2011-04-15", 355 , "h07" , 0.012,
          "2011-04-15", 800 , "h07" , 0.003,
          "2011-04-15", 670 , "h07" , 0.077
  )

est_ndvi <-
  fdata %>%
  group_by(date, ID) %>%
  filter(wlength %in% c(670, 800)) %>%
  pivot_wider(names_from = wlength, names_prefix = "R", values_from = R) %>%
  mutate(ndvi = (R800 - R670)/(R800 + R670))
meriops
  • 997
  • 7
  • 6
1

not so beautiful but should work:

library(dplyr)

data <- tibble(
  date = c("2020-01-01", "2020-01-01", "2020-01-02"),
  wlength = c(800, 670, 800),
  ID = c('c01', 'c01', 'c01'),
  R = c(1, 2, 3))

data

reduced <- data %>%
  filter(wlength %in% c(800, 670)) %>%
  mutate(
    R800 = ifelse(wlength == 800, R, NA),
    R670 = ifelse(wlength == 670, R, NA)) %>%
  group_by(date, ID) %>%
  summarise(
    R800 = max(R800, na.rm=TRUE),
    R670 = max(R670, na.rm=TRUE),
    NDVI = ((max(R800) - max(R670)) / (max(R800) + max(R670))))

reduced
randomchars42
  • 335
  • 1
  • 8
0

Up front, please see the note about floating-point equality below. While it may not bite you with this data, one problem with floating-point equality filtering is that you may not know it is occurring, and your calcs will be incorrect.

Two alternative solutions:

tidyverse, take 1

library(dplyr)
fdata %>%
  arrange(-wlength) %>%
  filter(wlength %in% c(352L, 350L)) %>%
  group_by(date, ID) %>%
  filter(n() == 2L) %>%
  summarize(
    quux = diff(R) / sum(R),
    .groups = "drop"
  )
# # A tibble: 4 x 3
#   date       ID      quux
#   <chr>      <chr>  <dbl>
# 1 2011-04-12 c01   -0.223
# 2 2011-04-12 c02   -0.152
# 3 2011-04-13 c01   -0.120
# 4 2011-04-13 c02    0.745

tidyverse, take 2

func <- function(wl, r, wavelengths = c(800, 670)) {
  inds <- sapply(wavelengths, function(w) {
    diffs <- abs(wl - w)
    which(diffs < 1)[1]
  })
  diff(r[inds]) / sum(r[inds])
}
fdata %>%
  group_by(date, ID) %>%
  summarize(
    quux = func(wlength, R, c(352, 350)),
    .groups = "drop"
  )
# # A tibble: 4 x 3
#   date       ID      quux
#   <chr>      <chr>  <dbl>
# 1 2011-04-12 c01   -0.223
# 2 2011-04-12 c02   -0.152
# 3 2011-04-13 c01   -0.120
# 4 2011-04-13 c02    0.745

Floating-point Equality

Your wlength is a numeric field, and testing for strict-equality with floating-point numbers does have its occasional risks. Computers have limitations when it comes to floating-point numbers (aka double, numeric, float). This is a fundamental limitation of computers in general in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.

integer strict equality is not a problem, and in my sample data they are integers. You have a few options for dealing with this, typically injecting/replacing components of the %>%-pipe.

  1. Convert to integer,

    mutate(wlength = as.integer(wlength))
    
  2. Filter with specific tolerance, perhaps

    filter(abs(wlength - 800) < 0.1 | abs(wlength - 670) < 0.1)
    
  3. Temporary conversion,

    filter(sprintf("%0.0f", wlength) %in% c("800", "670"))
    

    (not the most efficient, but effective and can support off-integer wavelengths).


Data

fdata <- read.table(header = TRUE, text = "
date       wlength ID
2011-04-12     350 c01
2011-04-12     351 c01
2011-04-12     352 c01
2011-04-12     353 c01
2011-04-12     354 c01
2011-04-12     355 c01
2011-04-13     350 c01
2011-04-13     351 c01
2011-04-13     352 c01
2011-04-13     353 c01
2011-04-13     354 c01
2011-04-13     355 c01
2011-04-12     350 c02
2011-04-12     351 c02
2011-04-12     352 c02
2011-04-12     353 c02
2011-04-12     354 c02
2011-04-12     355 c02
2011-04-13     350 c02
2011-04-13     351 c02
2011-04-13     352 c02
2011-04-13     353 c02
2011-04-13     354 c02
2011-04-13     355 c02
")
set.seed(2021)
fdata$R <- round(runif(nrow(fdata)), 3)
r2evans
  • 141,215
  • 6
  • 77
  • 149