0

In R, I currently have two dataframes which each contain a column of data I am interested in. I can plot these on a scatter plot and add a linear line to them (using abline(a,b), a= intercept, b= gradient). However I want to calculate the proportion of data above and below this line. Therefore what percent of each data set is in each section of the plot as separated by the line.

As there may or may not be a function directly related to this purpose splitting the data set as a result of the equation of this line will also do as I can then just see how many rows of data there are for each set.

I would think this should be easy enough to do but have been unable to find any information relating to this sort of operation. Thanks for any help.

For instance:

considering the small dataset c(10, 20, 30, 40, 50) and another dataset c(15, 30, 45, 60, 75) both share a dataset c(1, 2, 3, 4, 5). I will plot these on the same graph and use abline(,25) to plot a horizontal line (for simplicity for my data it is diagonal). I then want to work out the percentage of data above an below the line for dataset one it would be 10 and 20 therefore 2 out or 5 or 40% of data which is below the line.

How can I do this in R? Thanks.

Edit: found answer to this problem here: Calculating the number of dots lie above and below the regression line with R

Community
  • 1
  • 1
  • It would be easier to help you if you provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the desired output for that sample data. Assuming you have the model, it sounds like you just want to compare the actual values to the `fitted()` values. – MrFlick Nov 30 '15 at 03:42
  • I added something to this effect, i'm not entirely sure if it is what you had in mind. – user3069051 Nov 30 '15 at 03:55

0 Answers0