0

In "R", I want to overlay a "line of best fit" on a scatter plot, (using raw data), of the predictor variable "RM" on the x-axis and outcome variable "Price" on the y-axis. (The data contains the variable "Price" that will eventually be predicted on new data using ln() and pred()). I'm reviewing this plot before doing regression on the data for the purpose of determining if RM is significant (influences Price ... if it has a linear distribution tendency). If the plot shows no relationship I will drop it from the data set before doing linear regression ln() and prediction on it. I'm doing this on each potential predictor variable.

Do I somehow calculate the y-int and slope first? If so, how?

BostonHousingW.df <- read.csv("BostonHousing.csv")
plot(BostonHousingW.df$Price ~ BostonHousingW.df$RM, 
     xlab="Avg # of Rooms", 
     ylab="Median Home Price",
     main="Boston Housing Data\n Median Home Price and Avg # of Rooms")
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Does this answer your question? https://stackoverflow.com/a/3480460/12957340 – jared_mamrot Aug 30 '20 at 00:21
  • Or perhaps just `cor(RM, Price)` – G5W Aug 30 '20 at 00:34
  • 1
    I've searched and found only one variant (in many locations) of a `BostonHousing.csv` dataset, and it contains neither `$Price` nor `$RM`. Lacking usable data ... I'm out. – r2evans Aug 30 '20 at 00:50
  • Why are you using a plot to determine what is significant? You should be running your regression and looking at your coefficients from there to determine what's significant. Please also consider how to [make a good example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), such as including a data set that can be used to evaluate your code. – John Polo Aug 30 '20 at 01:04
  • I prefer stats myself but using visualizations was part of the assignment. – Andrea Whittaker Aug 31 '20 at 00:24

0 Answers0