0

I am currently working with a big biological dataset with many datapoint. The Head() function in R gives me the following column names:

intensity - Sample - Acession - Study - Dx

Intensity is the only data that is numeric. The others are character.

First, I have unfactorized all data into the following df: unfactordata. Next, I am interested in making a scatterplot of a specific subset of data which I do with the following piece of code where after I try to scatterplot it with a geom_smooth line in between. I use the following code:

scatplotprot <- function(name){

  proteinname <- subset(unfactordata, Acession == name)

  p <- ggplot(data = proteinname, aes(x = Dx, y = intensity, color = Study)) +
    geom_point() +
    geom_smooth(method = 'lm', aes(group = Dx))



  return(p)
}

This does gives me a scatterplot with all the intensity values between 2 groups (Dx), as well as being coloured depending on which Study the datapoint originates from. However, it will not show me a line between the two groups (Dx). Depending on which Acession I call I expect to see between 3 to 8 lines.

Hope anyone can help me clear this hopefully small problem.

Warmest,

Patrick

PWvanZalm
  • 1
  • 1
  • 1
    Welcome to Stack Overflow! You have a good start to a question here, but you are missing a reproducible example that we can use to test your code and see what's going on. You can see some ideas for how to make an example dataset to add to your question [in this link](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – aosmith Apr 17 '19 at 18:44
  • remove the call to group in geom_smooth and observe what happens, i.e change your code to `geom_smooth(method = 'lm'). You're plotting intensity vs dx but trying to do regression on another variable. Read the examples at https://ggplot2.tidyverse.org/reference/geom_smooth.html – infominer Apr 17 '19 at 19:21

0 Answers0