1

I have a df like this in R:

A B
a 6
b 13
c 9
c 16
c 17
c 23

I want a new column df$C, which substracts the value in B2 from the value in B1, if A2 = A1 and substracts the value in B3 from the value in B2, if A3 = A2 and so on ... Just like this:

A B C
a 6 6
b 13 13
c 9 9
c 16 7
c 17 1
c 23 6

I tried to write a simple ifelse-function, but don't know how I can compare two consecutively values in the same column.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

4

The lag() function from the dplyr package provides the previous observation:

df = data.frame("A" = c("a","b","c","c","c","c"),
                "B" = c(6,13,9,16,17,23))

library(dplyr)

df$A2 = dplyr::lag(df$A)
df$B2 = dplyr::lag(df$B)

Then adding the C and substract the needed columns:

df$C = ifelse(is.na(df$A2)==FALSE & df$A2 == df$A, df$B - df$B2, df$B)
df <- df %>% select(A, B, C)

Should work !

BPeif
  • 191
  • 6
  • Thanks a lot and sorry for the easy question. I'm quite new to R. – Maurice Hüttemann Oct 26 '21 at 13:21
  • `lag` only works on time series... at least for me... I just tried it, since me too didn't know about the function but it just returns a copy of the input-vector. Can I learn something about that? – TobiO Oct 26 '21 at 13:27
  • @MauriceHüttemann Sorry for the 'easily', just meant that this function was easy to handle, but i am new as well and just try to help with the few knowledge i have ! I edited, because as said TobiO the NA was not handled in my solution – BPeif Oct 26 '21 at 13:46
  • @TobiO There is the lag() function in the dplyr package, that is not the same as the stats package. – BPeif Oct 26 '21 at 13:47
  • 1
    thanks for the clarification :-) as usually base R trumps dplyr, the invocation should be `dplyr::lag()` – TobiO Oct 26 '21 at 13:51
  • Yeah you right ! I can change that too – BPeif Oct 26 '21 at 13:53
0

building on others:

lag() cannot work here, it has to be emulated, because the first value in the new column will have to be an NA by definition, as there is nothing to compare the first value to.

Building on @BPeif's answer:

df$A2=c(NA, df$A[-nrow(df)])
df$B2=c(NA, df$B[-nrow(df)])

df$C = ifelse(df$A2 == df$A, df$B - df$B2, df$B)
TobiO
  • 1,335
  • 1
  • 9
  • 24
  • You are right my solution does not work for the first observation as i did not care about the NA. I tried your solution and it did not work neither on me, but thanks to your answer i edited mine, it should be good now. Thanks ! – BPeif Oct 26 '21 at 13:42
  • funny. for me it works fine. The comparison of NA with anything returns NA and thus an NA in the result of `ifelse` – TobiO Oct 26 '21 at 13:55