2

I need to scatter plot Observed Vs Predicted data of each Variable using facet_wrap functionality of ggplot. I might be close but not there yet. I use some suggestion from an answer to my previous question to gather the data to automate the plotting process. Here is my code so far- I understand that the aes of my ggplot is wrong but I used it purposely to make my point clear. I would also like to add geom_smooth to have the confidence interval.

library(tidyverse)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))

DF1$df <- "Observed"
DF2$df <- "Predicted"

DF = rbind(DF1,DF2)
DF_long = gather(DF, key = "Variable", value = "Value", -df)

ggplot(DF_long, aes(x = Observed, y = Predicted))+
  geom_point() +   facet_wrap(Variable~.)+ geom_smooth()

I should see a plot like below, comparing Observed Vs Predicted for each Variable. enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
CForClimate
  • 335
  • 5
  • 19
  • 1
    The goal is to `scatter plot` observed versus predicted for each variable. That is plot on the `top left` should show `scatter plot` of observed vs predicted data for `Variable A` and plot at the `top right` shoud show `scatter plot` of observed vs predicted for `Variable B`. likewise for other two variables. With the current `ggplot` coding, I am getting now where. – CForClimate Oct 31 '19 at 20:46

2 Answers2

1

We will need to convert each dataframe separately then cbind as x is Observed and y is Predicted, then facet, see this example:

library(ggplot2)

# reproducible data with seed
set.seed(1)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))

DF1_long <- gather(DF1, key = "group", "Observed")
DF2_long <- gather(DF2, key = "group", "Predicted")
plotDat <- cbind(DF1_long, DF2_long[, -1, drop = FALSE])

head(plotDat)
#   group Observed Predicted
# 1     A 3.389578 10.590824
# 2     A 4.349115 10.234584
# 3     A 6.155680  8.298577
# 4     A 9.173870 11.750885
# 5     A 2.815137  7.942874
# 6     A 9.085507  6.203175


ggplot(plotDat, aes(x = Observed, y = Predicted))+
  geom_point() +
  facet_wrap(group~.) +
  geom_smooth()

enter image description here

We can use ggpubr to add P and R values to the plot see answers in this post:

zx8754
  • 52,746
  • 12
  • 114
  • 209
1

Similarly, consider merge on reshaped data frames using base R's reshape (avoiding any tidyr dependencies in case you are a package author). Below lapply + Reduce dynamically merges to bypass helper objects, DF1_long and DF2_long, in global environment:

Data

set.seed(10312019)

DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), 
                 C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), 
                 C = runif(12, 3,12), D = runif(12, 4,8))

Plot

library(ggplot2)      # ONLY IMPORTED PACKAGE

DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1, DF2)

DF_long <- Reduce(function(x,y) merge(x, y, by=c("Variable", "id")),
                  lapply(list(DF1, DF2), function(df) 
                       reshape(df, varying=names(DF)[1:(length(names(DF))-1)], 
                               times=names(DF)[1:(length(names(DF))-1)],
                               v.names=df$df[1], timevar="Variable", drop="df",
                               new.row.names=1:1E5, direction="long")
                  )
           )
head(DF_long)
#   Variable id Observed Predicted
# 1        A  1 6.437720 11.338586
# 2        A 10 4.690934  9.861456
# 3        A 11 6.116200  9.020343
# 4        A 12 6.499371  5.904779
# 5        A  2 6.779087  5.901970
# 6        A  3 6.499652  8.557102 


ggplot(DF_long, aes(x = Observed, y = Predicted)) +
  geom_point() + geom_smooth() + facet_wrap(Variable~.)

enter image description here

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you, I am new to R- still trying to understand amazing functionality of R. Answers like this helps exploring further. – CForClimate Oct 31 '19 at 21:32
  • I guess this would be a another question but if one would like to compare `cumulative distribution function (cdf)` of both `data.frame`, how one go about that? – CForClimate Oct 31 '19 at 21:37
  • Yes that would be another question. Please research and make an attempt at such a solution. – Parfait Oct 31 '19 at 23:19