0

I have two data frames which dimension are 24,523 × 3,468, and I want to get the scatter plot of them (data frame 1 in axis x and data frame 2 in axis 2)> then, I want to add a Loess line.

I can simply use a function plot() to get the scatter plot, but I do not know how to add a Loess line to the plot. Furthermore, I found that if the data for each axis is one vector only, instead of data frames, it can be done directly using a function called stat_smooth() in ggplot2 package.

My question is 1) how to get a scatter plot of two data frames using a function ggplot()? 2) How to add a Loess line to a scatter plot generated using two data frames?

This is scatter_plot what I get using

plot(as.matrix(spatial_data_glio_df_intersection_genes), as.matrix(estimated_all_gene_read_counts_spatial), xlab = "true_gene_read_counts", ylab = "estimated_gene_read_counts")

The data can be accessed using the link data.

MK Huda
  • 605
  • 1
  • 6
  • 16
  • @csgroen It does not, unfortunately. Your suggestion deals with plotting the data of two single vectors, right? While my data are two data frames. – MK Huda Jul 12 '22 at 10:18
  • Please post some minimal data: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – harre Jul 12 '22 at 11:42
  • @harre I provided the link to access the data, cant you access it? – MK Huda Jul 12 '22 at 11:46
  • Each file is 40MB. A more minimal example would be appreciated – harre Jul 12 '22 at 11:53
  • @harre Ow, that is what you meant. I will try to provide one. Thank you. – MK Huda Jul 12 '22 at 12:11

1 Answers1

2

Just linearize the two data frames with as.vector(). I've made a minimum reproducible example using random data. The first plot corresponds more or less to what you have currently, the second one hopefully corresponds to the desired output:

library(ggplot2)

df1 <- matrix(rnorm(1000), nrow = 100)
df2 <- matrix(rnorm(1000), nrow = 100)

plot(df1, df2, xlab = "true_gene_read_counts", ylab = "estimated_gene_read_counts")

joint_df <- data.frame(df1 = as.vector(df1), df2 = as.vector(df2))

ggplot(joint_df, aes(df1, df2)) +
    geom_point() +
    geom_smooth(method = "loess") +
    labs(x = "true_gene_read_counts", y = "estimated_gene_read_counts") +
    theme_linedraw()

enter image description here

csgroen
  • 2,511
  • 11
  • 28
  • I see so the idea is to change the dataframe into vector. I got it. Yes this is what I expected. I will try using my data to see how it turns out. Thank you very much. – MK Huda Jul 12 '22 at 12:24