1

I have a data.frame with several measurements of trees diameter. What I'm trying to do is compute de cumulative sum of the variable dbh_increase, which is product of a mutate operation (hope I'm been clear here).

My data.frame: https://www.dropbox.com/s/9usbu2kubbdyheu/bddendro.csv?dl=0

And here's the whole code I'm running:

bddendro<-read.table("bddendro.csv", h=T, sep = ";", dec = ",")
bddendro$dbh_new<-(bddendro$cbh_init + (bddendro$dendro_length * 0.2))/pi

bddendro<- bddendro %>%
  filter(med != 0) %>%
  group_by(parc, tree) %>%
  mutate(dbh_increase = ifelse(dendro_length < lag(dendro_length), 0 ,dbh_new - lag(dbh_new))) %>%
  mutate(dbh_cumsum = cumsum(dbh_increase))

The first mutate() works fine, at least as I'd expect, the second one that isn't working. Returning just NA values

SOLUTION:

cumsum() doesn't handle NA values, so I used mutate() to change NAs to 0, the code below:

mutate(dbh_increase = ifelse(is.na(dbh_increase), 0, dbh_increase)) 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    I'm not sure you've actually asked a question here. Be sure to include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run and test the code ourselves. – MrFlick Aug 14 '17 at 14:04
  • 5
    You have grouped by `parc` and `tree` (it seems from your posted data). These combinations have only one observation. The `lag` is calculated only within each grouping, and therefore becomes `NA` (there is never a previous observation in groups of 1). – Axeman Aug 14 '17 at 14:04
  • I´ll edit my post with the information you asked – Aníbal Deboni Neto Aug 14 '17 at 16:47

1 Answers1

0

Your first mutate() created NA values, as there is no lag(dbh_new) for the first row within each (parc, tree) group. As a result, cumsum() in the the second mutate() returns all NAs.

Try this instead:

bddendro2 <- bddendro %>%
  filter(med != 0) %>%
  group_by(parc, tree) %>%
  arrange(dendro_length) %>%
  mutate(dbh_increase = ifelse(is.na(lag(dbh_new)), 0, dbh_new - lag(dbh_new))) %>%
  mutate(dbh_cumsum = cumsum(dbh_increase)) %>%
  ungroup()
Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • thank you so much, works like a charm. Still trying to understand why is.na(lag(dbh_new)) works the same as "dendro_length < lag(dendro_length)" but I'll get it at some point. Thanks again. – Aníbal Deboni Neto Aug 15 '17 at 12:11
  • In fact, I did a more detailed test and it wasn't working as I'd expect. Made a few changes and now it seems to be working. Adding a new line to my original code to mutate the dbh_increase NA's values to 0 did the job – Aníbal Deboni Neto Aug 15 '17 at 12:31