1

I have a data frame that I'm working with that contains experimental data. For the purposes of this post we can limit the discussion to 3 columns: ExperimentID, ROI, isContrast, isTreated, and, Value. ROI is a text-based factor that indicates where a region-of-interest is drawn, e.g. 'ROI_1', 'ROI_2',...etc. isTreated and isContrast are binary fields indicating whether or not some treatment was applied. I want to make a scatter plot comparing the values of, e.g., 'ROI_1' vs. 'ROI_2 ', which means I need the data paired in such a way that when I plot it the first X value is from Experiment_1 and ROI_1, the first Y value is from Experiment_1 and ROI_2, the next X value is from Experiment_2 and ROI_1, the next Y value is from Experiment_2 and ROI_2, etc. I only want to make this comparison for common values of isContrast and isTreated (i.e. 1 plot for each combination of these variables, so 4 plots altogether.

Subsetting doesn't solve my problem because data from different experiments/ROIs was sometimes entered out of numerical order.

The following code produces a mock data set to demonstrate the problem

expID =      c('Bob','Bob','Bob','Bob','Lisa','Lisa','Lisa','Lisa','Alice','Alice','Alice','Alice','Joe','Joe','Joe','Joe','Bob','Bob','Alice','Alice','Lisa','Lisa')
treated  = c(0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,0,0,0,0)
contrast = c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
val = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,6,7,8,9,10,11)
roi = c(rep('A',16),'B','B','B','B','B','B')

myFrame = data.frame(ExperimentID=expID,isTreated = treated, isContrast= contrast,Value = val, ROI=roi)

     ExperimentID isTreated isContrast Value ROI
1           Bob         0          0     1   A
2           Bob         0          1     2   A
3           Bob         1          0     3   A
4           Bob         1          1     4   A
5          Lisa         0          0     1   A
6          Lisa         0          1     2   A
7          Lisa         1          0     3   A
8          Lisa         1          1     4   A
9         Alice         0          0     1   A
10        Alice         0          1     2   A
11        Alice         1          0     3   A
12        Alice         1          1     4   A
13          Joe         0          0     1   A
14          Joe         0          1     2   A
15          Joe         1          0     3   A
16          Joe         1          1     4   A
17          Bob         0          0     6   B
18          Bob         0          1     7   B
19        Alice         0          0     8   B
20        Alice         0          1     9   B
21         Lisa         0          0    10   B
22         Lisa         0          1    11   B

Now let's say I want to scatter plot values for A vs. B. That is to say, I want to plot x vs. y where {(x,y)} = {(Bob's Value from ROI A, Bob's Value from ROI B), (Alice's Value from ROI A, Alices Value from ROI B)},...} etc. and these all must have the same values for isTreated and isContrast for the comparison to make sense. Now, if I just go an subset I'll get something like:

> x= myFrame$Value[(myFrame$ROI == 'A') & (myFrame$isTreated == 0) & (myFrame$isContrast == 0)]
> x
[1] 1 1 1 1

> y= myFrame$Value[(myFrame$ROI == 'B') & (myFrame$isTreated == 0) &   (myFrame$isContrast == 0)]
> y
 [1]  6  8 10

Now as you can see the values in y correspond to the first rows of Bob, Lisa, Alice and Joe, respectively but the values of y Bob, Alice and Lisa respectively, and there is no value for Joe.

So say I ignored the value for Joe because that data is missing for B and just decided to plot the first 3 values of x vs. the first 3 values of y. The data are still out of order because x = (Bob, Lisa, Alice) but y = (Bob, Alice, Lisa) in terms of where the values are coming from. So I would like to now how to make vectors such that the order is correct and the plot makes sense.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 3
    When subsetting, order doesn't matter. Maybe you can make your example [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? What have you tried? – Gregor Thomas Dec 05 '13 at 23:21
  • I realize that when subsetting order doesn't matter, but I'm not sure of any other way to go about this other than a for loop which loops over all possible values of a ExperimentID and ROI and manually sorts these into x and y vectors based on the value of ROI. My gut feeling is that there must be a better way to go about this. My data frame is pretty large and contains unpublished data so I don't want to post it here. – user3072374 Dec 05 '13 at 23:26
  • I'm pretty confused if you say you understand that order doesn't matter with subsetting, but you also say "Subsetting doesn't solve my problem because data... was sometimes entered out of numerical order." – Gregor Thomas Dec 05 '13 at 23:31
  • I took a long shot with this question. It is unpublished patient data so it cannot be copied onto the internet. I realize my question would be clarified by data but that is not possible. Thank you, anyway, for your help. – user3072374 Dec 05 '13 at 23:37
  • What I mean by subsetting doesn't solve the problem is that, e.g. – user3072374 Dec 05 '13 at 23:37
  • 4
    You may be unable to share the data, but you could share your code, which would go a long way to helping us help you. I would strongly suggest following the reproducible example link above and posting your code with a toy dataset that has the same structure as your data, but with madeup data. – Michael Dec 05 '13 at 23:39
  • Listen to Michael. You need to post some data, but it doesn't have to be real data. Take 3 minutes to make a 10-row data frame with `ROI = paste("ROI", 1:10, sep = "_")`, `isTreated = runif(10) < 0.5` and `Value = rnorm(10)`, adding whatever is needed to give a sense of your problem. – Gregor Thomas Dec 05 '13 at 23:43
  • I have updated my question with more information and will work on producing a mock data frame that demonstrates the problem. – user3072374 Dec 05 '13 at 23:46

2 Answers2

2

Similar to @Matthew, with ggplot:

The idea is to reshape your data so the the values from ROI=A and RIO=B are in different columns. This can be done (with your sample data) as follows:

library(reshape2)
zz <- dcast(myFrame,
            value.var="Value",
            formula=ExperimentID+isTreated+isContrast~ROI)
zz
   ExperimentID isTreated isContrast A  B
1         Alice         0          0 1  8
2         Alice         0          1 2  9
3         Alice         1          0 3 NA
4         Alice         1          1 4 NA
5           Bob         0          0 1  6
6           Bob         0          1 2  7
7           Bob         1          0 3 NA
8           Bob         1          1 4 NA
9           Joe         0          0 1 NA
10          Joe         0          1 2 NA
11          Joe         1          0 3 NA
12          Joe         1          1 4 NA
13         Lisa         0          0 1 10
14         Lisa         0          1 2 11
15         Lisa         1          0 3 NA
16         Lisa         1          1 4 NA

Notiice that your sample data is rather sparse (lots of NA's).

To plot:

library(ggplot2)
ggplot(zz,aes(x=A,y=B,color=factor(isTreated))) + 
  geom_point(size=4)+facet_wrap(~isContrast)

Produces this:

The reason there are no blue points is that, in your sample data, there are no occurrences of isTreated=1 and ROI=B.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
0

Something like this, perhaps:

myFrameReshaped <- reshape(myFrame, timevar='ROI', direction='wide', idvar=c('ExperimentID','isTreated','isContrast'))
plot(Value.B ~ Value.A, data=myFrameReshaped)

enter image description here

To condition by the isTreated and isContrast variables, lattice comes in handy:

library(lattice)
xyplot(Value.B~Value.A | isTreated + isContrast, data=myFrameReshaped)

enter image description here

Values that are not present for one of the conditions give NA, and are not plotted.

head(myFrameReshaped)
##   ExperimentID isTreated isContrast Value.A Value.B
## 1          Bob         0          0       1       6
## 2          Bob         0          1       2       7
## 3          Bob         1          0       3      NA
## 4          Bob         1          1       4      NA
## 5         Lisa         0          0       1      10
## 6         Lisa         0          1       2      11
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112