In "R", I want to overlay a "line of best fit" on a scatter plot, (using raw data), of the predictor variable "RM" on the x-axis and outcome variable "Price" on the y-axis. (The data contains the variable "Price" that will eventually be predicted on new data using ln() and pred()). I'm reviewing this plot before doing regression on the data for the purpose of determining if RM is significant (influences Price ... if it has a linear distribution tendency). If the plot shows no relationship I will drop it from the data set before doing linear regression ln() and prediction on it. I'm doing this on each potential predictor variable.
Do I somehow calculate the y-int and slope first? If so, how?
BostonHousingW.df <- read.csv("BostonHousing.csv")
plot(BostonHousingW.df$Price ~ BostonHousingW.df$RM,
xlab="Avg # of Rooms",
ylab="Median Home Price",
main="Boston Housing Data\n Median Home Price and Avg # of Rooms")