-1

I need to be able to find the differences between consecutive values in a column of a dataframe and to output these differences as a new column amended to the same dataframe.

Here's a head of a section of the data I am working with:

    win0InitialChi2 win0FinalChi2 win1InitialChi2 win1FinalChi2 Det Bkgd     CH4 
1     764.0370000   0.000667857     2.19721e+03      4.24e-05    2.71747    1.92573   
2       0.0483236   0.000655749     1.66682e-03      4.79e-05    2.71742    1.92492   
3       0.0438540   0.000674729     9.08875e-04      5.63e-05    2.71709    1.91805   
4       0.0452078   0.000677721     6.41243e-04      5.09e-05    2.71769    1.93004   
5       0.0476242   0.000658611     5.76541e-04      3.99e-05    2.71747    1.92895   
6       0.0425603   0.000667191     2.34680e-03      4.74e-05    2.71741    1.92236   

I need to get the difference between consecutive values in the CH4 column, so entry 2 - entry 1, entry 3 - entry 2, and so on, and these to be output as a new column.

Edit: I have found how to do this using diff(), my next step is to find the percentage change. That is, what is 100*((entry 2 - entry 1) / entry 1). I am stuck at how to write a generalized operation for identifying the specific entry I need to divide by (in this case entry 1, but if I was to do 100*((entry 3 - entry 2) / entry 2) it would be entry 2, and so on. Thanks to anyone in advance.

Seanie_13
  • 21
  • 6
  • 1
    Does this answer your question? [difference in consecutive rows on data frames in a list](https://stackoverflow.com/questions/68753643/difference-in-consecutive-rows-on-data-frames-in-a-list) – waterloos Aug 12 '21 at 15:54
  • It does, though I did find a simpler route. See my edit too if you don't mind; I'm not at trying to identify this change as a percentage, which is harder than I expected. Thanks! – Seanie_13 Aug 12 '21 at 16:00

2 Answers2

2

The thread linked by @waterloos might work, but your solution can be simpler -- just use dplyr::lag()

library(dplyr)

df <- tibble(x = sample(1:10, 5))

df <- mutate(df, y = x - lag(x))

df

Edited to add percentage change column
You can calculate percent change from the previous row in a similar way:

df <- mutate(df,
             pct_change = (x - lag(x))/x
             )
  • Thanks Jacob, I actually managed to find a function that worked just as well by a similar mechanism - diff() - but am still stuck on how to then make this difference a percentage change. The math is obviously simple, the function less so as I am new to R. Any ideas? Thanks! – Seanie_13 Aug 12 '21 at 16:02
  • actually I take it back, your method ends up being more effective. My need to figure out percentages still stands though, if you have suggestions – Seanie_13 Aug 12 '21 at 16:09
  • @Seanie_13 Sure, see the edit in my answer above – Jacob Rothschild Aug 12 '21 at 17:11
0

Let us first prepare the sample data.frame just with 5 rows. The last column CH4 is from your data.frame.

df <- data.frame(
    date=c("12/31/2009","12/31/2010", "12/31/2011", "12/31/2012", "12/31/2013"),
    a=c(4.45, 4.55, 6.84, 6.84, 8.20),
    b=c(9.867, 11.333, 13.496, 11.804, 14.140),
    CH4=c(1.92573, 1.92492, 1.91805, 1.93004, 1.92895)
)

Let's define the function to calculate the growth rate of the given vector.

growth <- function (vec) {
    x0 <- vec[-length(vec)]
    x1 <- vec[-1]
    ((x1 - x0)/ x0)*100
}

It can be written in one line but I splitted into three lines so you can see how I calculate it. x0 is the vector without the last element of the given vector. The index -length(vec) drops the last element.

x1 is the vector with the first element dropped. So now you have two vectors, x0 and x1, whose elements in the same index is lagged.

R computes element-wise so (x1 - x0) / x0 does the calculation on each elements in the same index.

Lastly, you can combine the calculated vector with the original one as:

cbind(df, CH4.growth=c(0, growth(df$CH4)))

Here, I added 0 in the beginning of the calculated vector as c(0, growth(df$CH4)) since the length of the growth ratios is one less.

The final result you get is as follows:

        date    a      b     CH4  CH4.growth
1 12/31/2009 4.45  9.867 1.92573  0.00000000
2 12/31/2010 4.55 11.333 1.92492 -0.04206197
3 12/31/2011 6.84 13.496 1.91805 -0.35689795
4 12/31/2012 6.84 11.804 1.93004  0.62511405
5 12/31/2013 8.20 14.140 1.92895 -0.05647551
waterloos
  • 410
  • 2
  • 7