0

I am trying to calculate two things in R, Relative & Absolute Change, & plot 2-y axis scatter plots. I am still seeking inputs on creating a 2-y axis plot for this type of data.

set.seed(123)
df=expand.grid(PatientID=1:3,time=1:3, Group=1:2)
dat <- data.table(df,Outcome=as.integer(runif(9)*100))

Data Format df #sample

PatientID time Outcome Group
     1     1    87 1
     1     2    32 1
     1     3    76 2
     2     1    21 2
     2     2    23 3
     2     3    23 3

 ## Cont until 200 PatientID or volunteers and there are many outcome measure columns (33:290)

PatientID, time, Outcome, Group denote volunteers' identification number, time of visiting a hospital, outcome measure of interest and Group (whether they belong to a condition A or B) respectively. Data includes 3 visits by participants and two groups.

  1. Relative Change(%), i.e. expresses the absolute change as a percentage of the outcome from baseline time point, for Group 1 & 2.

[(F - B )/ B]*100, here B and F are baseline and follow up values of a outcome measure

  1. Absolute Change, i.e. F-B

  2. 2-y axes scatter plots:

The prime purpose of this plot is to look at the changes in outcome measures with respect to baseline (time=1), and also determine if there are group differences. It is prudent to include respective relative/absolute change values in the plot as y1 and y2.

I had made several scatterplots in ggplot2 and ggvis to view the trends, but I did not find a direct option to calculate (& plot) relative & absolute change through the ggplot2 & ggvis packages. I really recommend using them for novice users, like myself. In addition, I am also planning to incorporate relative & absolute change values in one scatterplot itself for one outcome measure, i.e. 2-y axes plots.

Let me know if you require some more clarifications. Thanks, and looking forward!

Answers for 1 & 2 Ques #thought it might help others

This is how I finally did it:

library(dplyr) dft1= filter(df, df$time==1) dft2= filter(df, df$time==2) dft3= filter(df, df$time==3)

To calculate absolute change from second to first time point & third to first time point: abs1=dft2[33:290] - dft1[33:290]

abs2=dft3[33:290] - dft1[33:290]

To calculate relative change from second to first time point & third to first time point: rel1=abs1/dft1[33:290]*100

rel2=abs2/dft1[33:290]*100

I will put absolute change and relative change on different y-axis axes. This link was handy to get me started: (How can I plot with 2 different y-axes?).

Nice resource for learning R: https://stackoverflow.com/tags/r/info

Community
  • 1
  • 1
Aby
  • 167
  • 1
  • 3
  • 16
  • You know about precedence of operations? I.e. first `/` or `*` and then `+` or `-`. – jogo Mar 02 '16 at 08:05
  • You know your data.table example line does absolutely not produce your example data.frame ? – Tensibai Mar 02 '16 at 08:11
  • @Tensibai: Nope, looks like I need to improvise it so that people can understand what I am asking – Aby Mar 02 '16 at 08:20
  • Thanks, Tensibai. I get an error with the code you gave. "Error in data.table(df, Outcome = as.integer(runif(9) * 100)) : problem recycling column 1, try a simpler type" – Aby Mar 02 '16 at 08:28
  • Aww, sorry, forgot a line :p `set.seed(123);df=expand.grid(PatientID=1:3,time=1:3);dat <- data.table(df,Outcome=as.integer(runif(9)*100))` – Tensibai Mar 02 '16 at 08:31
  • It is alright. New code for creating data table works, thanks again for your time. There are different values in outcome measures, but that does not matter much. – Aby Mar 02 '16 at 08:38
  • @Dr.AmitBansal think about editing your question to improve it :) – Tensibai Mar 02 '16 at 08:39
  • You could improve your question by showing the effort you invested and where you get stuck. Right now I read it as an analysis request, which I would deem less suitable for stackoverflow but rather something a hired gun would do. – Roman Luštrik Mar 02 '16 at 09:11
  • @Tensibai - I have made a few changes :) – Aby Mar 02 '16 at 13:02
  • @RomanLuštrik - You got me wrong. I have data with analysis done, but I wanna learn R and reciprocate graphs (similar to what people in my field make), so asking a few questions on where I got stuck. I hope the edited question clarifies most concerns. If not, pls feel free to shot again! – Aby Mar 02 '16 at 15:11

2 Answers2

3

Not clear exactly what you mean but you should be able to modify this code to achieve your purpose:

dat = data.table(PatientID=c(1,2), time=c(1:3), Outcome=c(87, 32,76,21,24, 27))
#Modified so you can actually compare across 2 time periods
#Note your data is already sorted, but to be on the safe side:
setkey(dat,PatientID,time)
dat[, `:=`(rel.change.1 = 100 * (Outcome - shift(Outcome)) / Outcome,
           rel.change.2 = 100 * (Outcome - shift(Outcome, 2)) / Outcome,
           abs.change.1 = Outcome - shift(Outcome),
           abs.change.2 = Outcome - shift(Outcome, 2)),
           by = PatientID]

The key idea is to use shift to get a shift of the Outcome column; the second argument to shift is the number of rows by which to shift it. Combined with grouping by PatientID, and given that we keyed the data.table in order to ensure it was sorted by time within groups of PatientID, this ensures the correct comparison. (Note, if your actual data is not complete, this will not produce correct results. For example, if you have observations at times 1 and 4 for PatientID=1 but 2 and 3 for PatientID = 2, then both 1-shifts will compare these observations even though they are not the same number of time units apart. If this is the case you should use CJ on the ID and time columns to get rows in which you fill NAs for all the missing observations; that will ensure that the shifts reflect the correct time differences.)

This produces:

> dat
   PatientID time Outcome rel.change.1 rel.change.2 abs.change.1 abs.change.2
1:         1    1      87           NA           NA           NA           NA
2:         1    2      24   -262.50000           NA          -63           NA
3:         1    3      76     68.42105    -14.47368           52          -11
4:         2    1      21           NA           NA           NA           NA
5:         2    2      32     34.37500           NA           11           NA
6:         2    3      27    -18.51852     22.22222           -5            6

Now, we can melt,

melted <- melt(dat,id.vars=c("PatientID","time"),variable.factor=F)

> melted
    PatientID time     variable      value
 1:         1    1      Outcome   87.00000
 2:         1    2      Outcome   24.00000
 3:         1    3      Outcome   76.00000
 4:         2    1      Outcome   21.00000
 5:         2    2      Outcome   32.00000
 6:         2    3      Outcome   27.00000
 7:         1    1 rel.change.1         NA
 8:         1    2 rel.change.1 -262.50000
 9:         1    3 rel.change.1   68.42105
10:         2    1 rel.change.1         NA
11:         2    2 rel.change.1   34.37500
12:         2    3 rel.change.1  -18.51852
13:         1    1 rel.change.2         NA
14:         1    2 rel.change.2         NA
15:         1    3 rel.change.2  -14.47368
16:         2    1 rel.change.2         NA
17:         2    2 rel.change.2         NA
18:         2    3 rel.change.2   22.22222
19:         1    1 abs.change.1         NA
20:         1    2 abs.change.1  -63.00000
21:         1    3 abs.change.1   52.00000
22:         2    1 abs.change.1         NA
23:         2    2 abs.change.1   11.00000
24:         2    3 abs.change.1   -5.00000
25:         1    1 abs.change.2         NA
26:         1    2 abs.change.2         NA
27:         1    3 abs.change.2  -11.00000
28:         2    1 abs.change.2         NA
29:         2    2 abs.change.2         NA
30:         2    3 abs.change.2    6.00000
    PatientID time     variable      value

And plot

ggplot(melted,aes(x=time,y=value,color=factor(PatientID))) +
    geom_point() +
    facet_wrap(~variable,scales="free") +
    labs(color="PatientID")

Example plot of raw data and all computed variables in facets

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Philip
  • 7,253
  • 3
  • 23
  • 31
  • If your grouping variable is always `PatientID`, why are you running this 4 times? Can't you just create them all in a single run? – David Arenburg Mar 02 '16 at 08:36
  • Good point, and in fact you could even functionalize the operation to create abs/rel change in terms of `n` and `lapply` to create the shifts for each `n` in `1:n`. But as this is a toy example for a novice user who stated he had a total of 200 obs groups (negating any need for speed), and who may run line-by-line in a REPL to learn what each line does, I think OP may have an easier time following this syntax than a one-liner like `dat[,c("...","...","...","..."):=.(...,...,...,...),by=...]` (let alone something like `lapply(1:2,function(n) dat[,c("..",".."):=.(eval(parse(text=...)),...)`). – Philip Mar 02 '16 at 08:47
  • That said, please feel free to edit my post to add the example of doing this with only one pass. – Philip Mar 02 '16 at 08:49
  • Many thanks, Philip. Adding PatientID, in color aesthetics, might make it hard to understand the data as there are hundreds of participants, and I am basically looking at overall changes in outcome measure with respect to baseline time point (not individual changes). I have a separate column for grouping variable, "Group". I will also compare two groups, but I have not mentioned that in the main question. Thank again for taking your time. – Aby Mar 02 '16 at 09:02
  • Yes, I didn't mean for you literally to use the graph example I posted, just to get a sense of what's out there. Check out `stat_summary`, in particular given that you have groups you will probably want to `group` or `color` by `Group` and a `stat_summary` can average (or whatever you want) over all the observations in that group. (BTW, I understand now that you meant all your comparisons to be to time-1 rather than at fixed intervals of time of given lengths, but didn't realize this before, so I see that my answer isn't quite applicable to your case.) – Philip Mar 02 '16 at 09:14
  • @DavidArenburg - It is a good alternative, thanks. As @Philip guessed, it's easier for me to run `codes` line-by-line and I prefer to understand my script/code, but I also need to learn more about loops and lapply. So any suggestion, or a brief code to play with, is welcome! Learning by trying :) – Aby Mar 02 '16 at 19:03
2

Other approach:

set.seed(123)
df = expand.grid(PatientID = 1:3,time = 1:3)
dat <- data.table(df,Outcome = as.integer(runif(9) * 100))

setkeyv(dat,"PatientID")
dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID]
dat[, rel.change := abs.change / Outcome[time == 1], by = PatientID]

ggplot(melt(dat,c('PatientID','time')), aes(x = time,y = value,color = factor(PatientID))) + 
  geom_line() + 
  facet_wrap( ~ variable,scales = "free")

Which gives (drawing borrowed from @Philip answer):

enter image description here

You can chain the two steps of adding columns like this (but it's less readable):

dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID][, rel.change := abs.change / Outcome[time == 1], by = PatientID]
Tensibai
  • 15,557
  • 1
  • 37
  • 57
  • Thanks, Tensibai. I am again getting error message with "setkey(dat,"PatientID")". Error message - "Error in setkeyv(x, cols, verbose = verbose, physical = physical) : x is not a data.table". Any thoughts? Nice plots, how do I put relative change (percentage) & absolute change values in the graphs? Abuse legend option or is there a legit way to do this? – Aby Mar 02 '16 at 08:53
  • Try a `setDT(dat)` for your real data, (which sounds to be a data.frame then). I don't get your question about the plots, there's 3 plots, one for each "variable", I assume this become a different question. – Tensibai Mar 02 '16 at 08:56
  • The syntax is either `setkey(dat,PatientID)` or `setkeyv(dat,"PatientID")`. Use `v` when the argument is a character string. Make sure that `dat` actually *is* a `data.table` (check with `class(dat)`). – Philip Mar 02 '16 at 08:56
  • @Philip good catch, my session didn't complain about it, fixing the answer. – Tensibai Mar 02 '16 at 08:58
  • Great, thanks. I changed `dat` from a data.frame to data.table. After running `dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID]`, I received an error: "There were 50 or more warnings (use warnings() to see the first 50)". Then, I ran ggplot command, I got another error: "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? Warning message: attributes are not identical across measure variables; they will be dropped". Still checking alternatives. Thanks again! – Aby Mar 02 '16 at 12:10