how to apply an equation to one column having in consideration other columns of a dataframe in r?

Question

Here is how my data looks like:

tibble [1,702,551 x 4] (S3: tbl_df/tbl/data.frame)
$ date   : Date[1:1702551], format: "2011-04-12" "2011-04-12" ...
$ wlength: num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
$ ID     : chr [1:1702551] "c01" "c01" "c01" "c01" ...
$ R      : num [1:1702551] 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 ...

head(fdata)
A tibble: 6 x 4
date       wlength ID        R
<date>       <dbl> <chr> <dbl>
1 2011-04-12     350 c01   0.009
2 2011-04-12     351 c01   0.009
3 2011-04-12     352 c01   0.009
4 2011-04-12     353 c01   0.009
5 2011-04-12     354 c01   0.009
6 2011-04-12     355 c01   0.009

Quick explanation of the data: During 9 years, data was collected through the years (date) on the reflectance (wavelength) of different kinds of vegetation (ID), for instante, "c01", "h07"... and the value associated is (R).

I want to apply this equation of the Normalized Difference Vegetation Index (NDVI):

(R800-R670)/(R800+R670)

The numbers in front of R are the wavelenghts(wlength). Basically for each "date" and each "ID" I want to extract the value of R when the wavelength is equal to 800 and 670 and apply the equation.

How can I adress all this variables in order to apply this equation to my data?

Any help would be much appreciatted. Thank you.

How do you calculate R? Does the result (equation) need to be a vector or a scalar? — Eyayaw, Jan 18 '21 at 15:36
R is the average reflectance by the number of repetitions of a specific vegetation code, for instance, h01 at a specific wavelength. The result needs to be scalar. — Cláudio Siva, Jan 18 '21 at 16:26

score 1 · Answer 1 · answered Jan 18 '21 at 16:13

Here's a possibility using the tidyverse:

library(tidyverse)

fdata <-
  tribble(
          ~date , ~wlength , ~ID , ~R,
          "2011-04-12", 354 , "c01" , 0.022 ,
          "2011-04-12", 800 , "c01" , 0.014,
          "2011-04-12", 670 , "c01" , 0.009,
          "2011-04-15", 355 , "h07" , 0.012,
          "2011-04-15", 800 , "h07" , 0.003,
          "2011-04-15", 670 , "h07" , 0.077
  )

est_ndvi <-
  fdata %>%
  group_by(date, ID) %>%
  filter(wlength %in% c(670, 800)) %>%
  pivot_wider(names_from = wlength, names_prefix = "R", values_from = R) %>%
  mutate(ndvi = (R800 - R670)/(R800 + R670))

Thank you so much for your help. – Cláudio Siva Jan 18 '21 at 17:16 — Cláudio Siva, Jan 18 '21 at 17:16

score 1 · Accepted Answer · answered Jan 18 '21 at 16:18

not so beautiful but should work:

library(dplyr)

data <- tibble(
  date = c("2020-01-01", "2020-01-01", "2020-01-02"),
  wlength = c(800, 670, 800),
  ID = c('c01', 'c01', 'c01'),
  R = c(1, 2, 3))

data

reduced <- data %>%
  filter(wlength %in% c(800, 670)) %>%
  mutate(
    R800 = ifelse(wlength == 800, R, NA),
    R670 = ifelse(wlength == 670, R, NA)) %>%
  group_by(date, ID) %>%
  summarise(
    R800 = max(R800, na.rm=TRUE),
    R670 = max(R670, na.rm=TRUE),
    NDVI = ((max(R800) - max(R670)) / (max(R800) + max(R670))))

reduced

@CláudioSiva You could consider marking one of the answers as accepted. — randomchars42, Jan 18 '21 at 18:28

r2evans · Answer 3 · 2021-01-18T16:47:44.197

Up front, please see the note about floating-point equality below. While it may not bite you with this data, one problem with floating-point equality filtering is that you may not know it is occurring, and your calcs will be incorrect.

Two alternative solutions:

tidyverse, take 1

library(dplyr)
fdata %>%
  arrange(-wlength) %>%
  filter(wlength %in% c(352L, 350L)) %>%
  group_by(date, ID) %>%
  filter(n() == 2L) %>%
  summarize(
    quux = diff(R) / sum(R),
    .groups = "drop"
  )
# # A tibble: 4 x 3
#   date       ID      quux
#   <chr>      <chr>  <dbl>
# 1 2011-04-12 c01   -0.223
# 2 2011-04-12 c02   -0.152
# 3 2011-04-13 c01   -0.120
# 4 2011-04-13 c02    0.745

tidyverse, take 2

func <- function(wl, r, wavelengths = c(800, 670)) {
  inds <- sapply(wavelengths, function(w) {
    diffs <- abs(wl - w)
    which(diffs < 1)[1]
  })
  diff(r[inds]) / sum(r[inds])
}
fdata %>%
  group_by(date, ID) %>%
  summarize(
    quux = func(wlength, R, c(352, 350)),
    .groups = "drop"
  )
# # A tibble: 4 x 3
#   date       ID      quux
#   <chr>      <chr>  <dbl>
# 1 2011-04-12 c01   -0.223
# 2 2011-04-12 c02   -0.152
# 3 2011-04-13 c01   -0.120
# 4 2011-04-13 c02    0.745

Floating-point Equality

Your wlength is a numeric field, and testing for strict-equality with floating-point numbers does have its occasional risks. Computers have limitations when it comes to floating-point numbers (aka double, numeric, float). This is a fundamental limitation of computers in general in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.

integer strict equality is not a problem, and in my sample data they are integers. You have a few options for dealing with this, typically injecting/replacing components of the %>%-pipe.

Convert to integer,
```
mutate(wlength = as.integer(wlength))
```

Filter with specific tolerance, perhaps

filter(abs(wlength - 800) < 0.1 | abs(wlength - 670) < 0.1)

Temporary conversion,
```
filter(sprintf("%0.0f", wlength) %in% c("800", "670"))
```
(not the most efficient, but effective and can support off-integer wavelengths).

Data

fdata <- read.table(header = TRUE, text = "
date       wlength ID
2011-04-12     350 c01
2011-04-12     351 c01
2011-04-12     352 c01
2011-04-12     353 c01
2011-04-12     354 c01
2011-04-12     355 c01
2011-04-13     350 c01
2011-04-13     351 c01
2011-04-13     352 c01
2011-04-13     353 c01
2011-04-13     354 c01
2011-04-13     355 c01
2011-04-12     350 c02
2011-04-12     351 c02
2011-04-12     352 c02
2011-04-12     353 c02
2011-04-12     354 c02
2011-04-12     355 c02
2011-04-13     350 c02
2011-04-13     351 c02
2011-04-13     352 c02
2011-04-13     353 c02
2011-04-13     354 c02
2011-04-13     355 c02
")
set.seed(2021)
fdata$R <- round(runif(nrow(fdata)), 3)

Thank you for your help. A whole new perception on how I can resolve this question. — Cláudio Siva, Jan 18 '21 at 17:16

how to apply an equation to one column having in consideration other columns of a dataframe in r?

3 Answers3

tidyverse, take 1

tidyverse, take 2

Floating-point Equality