0

I have a dataframe with three columns, call it (X,Y,Z). Such that:

  • X is numeric variable
  • Y is a numeric variable
  • Z is a factor variable

I want to plot (using ggplot2) Y againts X and make color groups based on the factor variable Z. This I have managed!

Now I need to plot some regression lines, I know how to plot a regression line for each set of points belonging to the same category (i.e. same factor variable Z). However what I need is to plot TWO regression lines for each category (might seem weird but in the problem I am dealing with it is the way is always done). So, for each category (Z) I should have a regression line computed from the first n elements (belonging to that category) and a second regression line made from the remaining points in the given category, of course both of these lines should have the same color as they interpolate points in a given category (i.e. same color group).

Any help is very much appreciated! Thank you in advance

Miguel Garcia
  • 97
  • 1
  • 8
  • Hi Miguel. Could you please add a snippet of your data as a `dput` and the code you tried so far? See [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) To post your data type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. – stefan Jan 23 '21 at 20:26

1 Answers1

1

If the two ranges of x that you want to are independent and you want to generate 4 separate regression lines (as is my understanding of your question) then you can specify the data to use in 2 calls to geom_smooth(). Here, head() and tail() are indicating which values of x you want to regress on, assuming the points are ordered in df. If they are not ordered, you will need to do that first (e.g. using a call to arrange() by values on the x-axis).

library(tidyverse)

# some test data with 3 variables: a random response (y), a sequence (x), and a factor (z).
df<-tibble(x = seq(0.5, 25, 0.5),
           y = rnorm(50),
           z = sample(x = c("A", "B"), replace = T, size = 50))

# a plot with each factor of z coloured and 2 regression lines for each factor
ggplot(df, aes(x, y, colour = z))+
  geom_point()+
  geom_smooth(data = ~head(df, 30), method = "lm", se = F)+
  geom_smooth(data = ~tail(df ,20), method = "lm", se = F)+
  theme_minimal()
SGE
  • 311
  • 3
  • 10