0

I have been trying to add a new column to my data frame resulting from the substraction of the values of one column by pairs of lines for each "sub-data frame" (each "id_n").

My data frame looks like this: enter image description here

dput(df[1:30,c(2,5,6,9,14,15)])

structure(list(gen_spe = c("holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads", "holo_ads"), ori = c("guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad", "guad"), spe = c("ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads", "ads" ), id_n = c("1_1", "1_1", "1_1", "1_1", "1_10", "1_10", "1_10", "1_10", "1_11", "1_11", "1_11", "1_11", "1_12", "1_12", "1_12", "1_12", "1_13", "1_13", "1_13", "1_13", "1_14", "1_14", "1_14", "1_15", "1_15", "1_15", "1_16", "1_16", "1_16", "1_16"), npu = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4), duper = c(0.00997, 0.01002, 0.01213, NA, 0.01049, 0.01024, 0.01292, NA, 0.01054, 0.01009, 0.01424, NA, 0.01088, 0.01027, 0.01444, NA, 0.0102, 0.00995, 0.01165, NA, 0.01079, 0.01047, NA, 0.01061, 0.01129, NA, 0.01038, 0.0102, 0.01317, NA)), row.names = c(NA, 30L), class = "data.frame")

So, we have something like :

id_n <- c("1_1","1_1","1_1","1_1","2_1","2_2","2_3","2_4","3_1","3_2")
duper <- c("0.00997","0.01002","0.01213", "NA", "0.01024", "0.01024", "0.01258", "NA", "0.01045", "0.01020")
npu <- c("1", "2", "3", "4", "1", "2", "3", "4", "1", "2")
x <- data.frame(id_n, duper, npu)

I would like R to give me a new column that corresponds to the substraction of values of column 'duper' 2 by 2 for each id_n.

For example, for id_n = 1_1 : 0.01002-0.00997; 0.01213-0.01002. For id_n = 1_2 : 0.01024-0.01024; 0.01258-0.01024. For id_n = 1_3 : 0.01061-0.01047 And so on.

I am able to make a list of the 'sub-data frames' on which I then would like to apply a funcion but I do not know how to ask R to calculate this. The column 'npu' could be used as values go from 1 to ... for each id_n.

Do you have some ideas?

Thank you very much,

Marine.

Marine
  • 21
  • 1
  • 7
  • 1
    Please do not post photos of data or code! If you do, people who are willing to help you would have to type out all that text. Instead provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) P.S. Here is [a good overview on how to ask a good question](https://stackoverflow.com/help/how-to-ask) – dario Oct 20 '21 at 11:59
  • Please just provide the dput(head(name of your dataset)) in order to help you – 12666727b9 Oct 20 '21 at 12:05
  • Sorry, I added some information and simple code to explain what I try to do. I hope this is fine! – Marine Oct 20 '21 at 12:25

1 Answers1

1

Something like this (using tidyverse library):

install.packages("tidyverse")
library(tidyverse)

x$duper <- as.numeric(x$duper)

x %>%
  group_by(id_n) %>%
  mutate(new_col = duper - lag(duper))

Returns this result:

# A tibble: 10 x 4
# Groups:   id_n [7]
   id_n     duper npu      new_col
   <chr>    <dbl> <chr>      <dbl>
 1 1_1    0.00997 1     NA        
 2 1_1    0.0100  2      0.0000500
 3 1_1    0.0121  3      0.00211  
 4 1_1   NA       4     NA        
 5 2_1    0.0102  1     NA        
 6 2_2    0.0102  2     NA        
 7 2_3    0.0126  3     NA        
 8 2_4   NA       4     NA        
 9 3_1    0.0104  1     NA        
10 3_2    0.0102  2     NA 
denisafonin
  • 1,116
  • 1
  • 7
  • 16
  • 1
    Thank you denisafonin ! For the whole data frame, I have to write it like this : `df <- df %>% group_by(id_n) %>% mutate(new_col = duper - lag(duper))` – Marine Oct 20 '21 at 13:11