1

I have a scatterplot, I want to be able to filter the data for that scatterplot.enter image description here

You see four plots in this Image. 1) Middle green curve, 2) Upper Black Curve, 3) Lower Black Curve, 4) Blue Scatterplot.

I have all these in the form of the data frame:

Blue scatterplot:

df <- mtcars
geom_point(df, aes(x,y), color = 'blue')

Green curve:

geom_smooth(formula=y~x, method='loess', color='green3', se=FALSE, size=0.5)

Upper Curve:

geom_smooth(formula=y+1~x, method='loess', color='gray20', se=FALSE, size=0.5)

Lower Curve

geom_smooth(formula=y-1~x, method='loess', color='gray20', se=FALSE, size=0.5)

I want to filter the blue data points by black curve lines, such that only blue data points remaining should be in between these two black lines and outliers should have to be get removed.

I tried using which, filter, Subset functions. But, it is not working, it is not rendering the output that I want.

In the end, I want the scatter data which lies between those two black lines.

  • 4
    Please make your question reproducible: include your code and data; use `dput(...your data...)` for your data. Have a look at [mre] and [ask] for guidance. – Peter Jul 09 '20 at 09:21

1 Answers1

2

I am posting a solution since this question can be helpful to others. General idea here is conditional coloring of the points. Basically, if they fall between the curves, we give them a color and otherwise color would be NA.

Here, I assumed that we have the curves functions which we can use in our ifelse. If that's not the case, then we need to find the best fit. You can find helpful answers about Fitting a curve to specific data in this thread.

x <- (1:10)
y <- x^4

set.seed(123)
xp <- rnorm(100, mean=5.5, sd = 4)
yp <- rnorm(100, mean=5e3, sd=5e3)

plot(x,y, type = "l")
lines(x, y+mean(y), col = "green")
lines(x, y+2*mean(y))
points(x=xp, y=yp, type = "p", col=ifelse(yp < xp^4 + 2*mean(y) & yp > xp^4, "blue", NA))

M--
  • 25,431
  • 8
  • 61
  • 93
  • Yes, But how do we extract that data in between those lines and delete the data lying outside of those lines? – Kshitij15571 Jul 19 '20 at 21:42
  • @Kshitij15571 You need to clarify your question. As Peter mentioned, we need a reproducible example of your data to provide a working solution. Please edit your question and add some data. More importantly, answer these questions, again as an edit to your question. Do you have the equation of the lines? If so, what are the equations? – M-- Jul 19 '20 at 21:45
  • If you want to delete them, rather than just not showing them in the graph, we can subset the data, with the same conditions I mentioned above in my answer. But to provide a solution, again, we need a reproducible example of your data and the answers to the questions I laid out in my previous comment. – M-- Jul 19 '20 at 21:47
  • I tried subsetting the data set, but it is not working. Also, I am quite new to the coding, I don't know how to post reproducible examples. – Kshitij15571 Jul 21 '20 at 08:53
  • @Kshitij15571 read the link that is shared with you in the comments about creating reproducible example – M-- Jul 21 '20 at 13:23