2

Here is the example dataframe:

data.frame(sample = c('A','A','A','A','A','B','B','B','B','B'),
                 measure = c(20,30,40,60,60,20,60,50,40,10),
                 time = c(1,2,3,4,5,3,4,5,6,7),
                 start = c(1,1,1,1,1,3,3,3,3,3),
                 end = c(4,4,4,4,4,6,6,6,6,6))
   sample measure time start end
1       A      20    1     1   4
2       A      30    2     1   4
3       A      40    3     1   4
4       A      60    4     1   4
5       A      60    5     1   4
6       B      20    3     3   6
7       B      60    4     3   6
8       B      50    5     3   6
9       B      40    6     3   6
10      B      10    7     3   6

What I want to do is create a new column called "calc" that calculates the measure for time=end / time=start

I'm guessing group_by sample first but I'm having trouble figuring out how to reference the start and end times. Here's what the final data should look like

   sample measure time start end calc
1       A      20    1     1   4    3
2       A      30    2     1   4    3
3       A      40    3     1   4    3
4       A      60    4     1   4    3
5       A      60    5     1   4    3
6       B      20    3     3   6    2
7       B      60    4     3   6    2
8       B      50    5     3   6    2
9       B      40    6     3   6    2
10      B      10    7     3   6    2
user438383
  • 5,716
  • 8
  • 28
  • 43
yixu501
  • 23
  • 3
  • 2
    I don't get it. Why is `calc` = 3 for `sample` A and not equal to 4? – Jan Aug 20 '23 at 20:54
  • Sorry if it's confusing. calc for SampleA is measure at time=4 (60) divided by measure at time=1 (20), which equals 3 – yixu501 Aug 20 '23 at 21:09

2 Answers2

1

what about (d being your dataframe):

library(dplyr)

d |>
  arrange(sample, time) |>
  rowwise() |>
  mutate(is_endpoint = any(time == start, time == end)) |>
  ungroup() |>
  mutate(calc = tail(measure[is_endpoint], 1) /
           head(measure[is_endpoint], 1),
         .by = sample
         ) |>
  select(-is_endpoint)

output:

 # A tibble: 10 x 6
   sample measure  time start   end  calc
   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
 1 A           20     1     1     4     3
 2 A           30     2     1     4     3
 3 A           40     3     1     4     3
 4 A           60     4     1     4     3
 5 A           60     5     1     4     3
 6 B           20     3     3     6     2
 7 B           60     4     3     6     2
 8 B           50     5     3     6     2
 9 B           40     6     3     6     2
10 B           10     7     3     6     2
I_O
  • 4,983
  • 2
  • 2
  • 15
  • Thank you! This works great assuming end measure is > start measure, which was all I had in the original data. But if I set sample B end measure at t = 6 to be equal to 10, instead of 0.5 (10/20) I would get 2 because it's taking max/min. – yixu501 Aug 20 '23 at 22:19
  • Oops. See edited version, please. To make sure, data are now first sorted by group and time, so that `head` and `tail` indeed return the relevant endpoints. – I_O Aug 20 '23 at 22:31
  • Works perfectly thank you! – yixu501 Aug 20 '23 at 22:57
0

I think you might have made an error in your expected results. Sample A rows seem to be a difference while sample B seems to be a ratio. If you wanted the latter, shouldn't calc be 4 for all the A samples.

If we start with your data frame as df:

df<-data.frame(sample = c('A','A','A','A','A','B','B','B','B','B'),
                 measure = c(20,30,40,60,60,20,60,50,40,10),
                 time = c(1,2,3,4,5,3,4,5,6,7),
                 start = c(1,1,1,1,1,3,3,3,3,3),
                 end = c(4,4,4,4,4,6,6,6,6,6))

The data.table package lets you do either calculation in one line

 library(data.table)
 DT<-data.table(df)
    
 DT[,calc_ratio:=end/start]
 DT[,calc_difference:=end-start]
 DT

to give:

    sample measure time start end calc_ratio calc_difference
 1:      A      20    1     1   4          4               3
 2:      A      30    2     1   4          4               3
 3:      A      40    3     1   4          4               3
 4:      A      60    4     1   4          4               3
 5:      A      60    5     1   4          4               3
 6:      B      20    3     3   6          2               3
 7:      B      60    4     3   6          2               3
 8:      B      50    5     3   6          2               3
 9:      B      40    6     3   6          2               3
10:      B      10    7     3   6          2               3
KJG
  • 23
  • 5
  • So the calc column is calculated from the measure column and not the time. The time just indicates which start and end measure columns to take – yixu501 Aug 20 '23 at 21:16