0

I have the following data frames:

df1<-data.frame(id_1=c(1,2,3,4,5),
                value1=c(0,0.2,0.5,0.8,0),
                value2=c(0.1,0.3,0.5,0.7,0.8),
                value3=c(0.5,0.6,0.3,0.2,0.1))

df2<-data.frame(id_2=c(1,2,3,4,5),
                value1=c(0,0.2,0.5,0.8,0),
                value2=c(0.1,0.1,0.5,0.6,0.7),
                value3=c(0.4,0.4,0.8,0.9,0.2))

I want to make the following plot:

ggplot(data.frame(x=df1$value1, y=df2$value1), aes(x=x, y=y)) + 
       geom_point() + 
       geom_point(data.frame(x=df1$value2, y=df2$value2), aes(x=x, y=y)) + 
       geom_point(data.frame(x=df1$value3, y=df2$value3), aes(x=x, y=y))

How can I make that plot without having to copy paste geom_point() for each value column? And afterwards, how can I find the correlation coefficient for the variables in final overlapped plot?

Any help would be much appreciated, thanks!

user1490
  • 163
  • 5
  • Like [this](https://stackoverflow.com/q/60721141/5325862)? Or [this](https://stackoverflow.com/q/55887320/5325862) or [this](https://stackoverflow.com/q/6525864/5325862)? Once you've bound the data frames, just use `cor` to get the coefficient – camille Jan 18 '22 at 15:50

1 Answers1

0

You need to combine your data into one data frame. Here's one way:

## make column names the same
## and add columns indicating the data frame source
df1$var = "x"
df2$var = "y"
names(df1)[1] = "id"
names(df2)[1] = "id"

## put the data together
df = rbind(df1, df2)

## reshaped the data
library(tidyr) 
df = pivot_longer(df, starts_with("value"))
df = pivot_wider(df, names_from = "var", values_from = "value")
df
# # A tibble: 15 × 4
#       id name       x     y
#    <dbl> <chr>  <dbl> <dbl>
#  1     1 value1   0     0  
#  2     1 value2   0.1   0.1
#  3     1 value3   0.5   0.4
#  4     2 value1   0.2   0.2
#  5     2 value2   0.3   0.1
#  6     2 value3   0.6   0.4
#  7     3 value1   0.5   0.5
#  8     3 value2   0.5   0.5
#  9     3 value3   0.3   0.8
# 10     4 value1   0.8   0.8
# 11     4 value2   0.7   0.6
# 12     4 value3   0.2   0.9
# 13     5 value1   0     0  
# 14     5 value2   0.8   0.7
# 15     5 value3   0.1   0.2

Once your data is in a tidy format, plotting is simple. You could further customize your plot using shape or color aesthetics to identify the data source.

ggplot(df, aes(x = x, y = y)) +
  geom_point()

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294