Notice that your graphic constructed from Problem 4 shows a quadratic or curved relationship between log_wages against exp. The next task is to plot three quadratic functions for each race level "black", "white" and "other". To estimate the quadratic fit, you can use the following function quad_fit:
```{r}
quad_fit <- function(data_sub) {
return(lm(log_wage~exp+I(exp^2),data=data_sub)$coefficients)
}
quad_fit(salary_data)
```
The above function computes the least squares quadratic fit and returns coefficients a1, a2, a3, where
Y(hat) = a1 + a2x + a3x^2
where Y(hat) = log(wage) and x = exp
Use ggplot to accomplish this task or use base R graphics for partial credit. Make sure to include a legend and appropriate labels.
My attempt
blackfit <- quad_fit(salary_data[salary_data$race == "black",])
whitefit <- quad_fit(salary_data[salary_data$race == "white",])
otherfit <- quad_fit(salary_data[salary_data$race == "other",])
yblack <- blackfit[1] + blackfit[2]*salary_data$exp + blackfit[3]*(salary_data$exp)^2
ywhite <- whitefit[1] + whitefit[2]*salary_data$exp + whitefit[3]*(salary_data$exp)^2
yother <- otherfit[1] + otherfit[2]*salary_data$exp + otherfit[3]*(salary_data$exp)^2
soloblack <- salary_data[salary_data$race == "black",]
solowhite <- salary_data[salary_data$race == "white",]
soloother <- salary_data[salary_data$race == "other",]
ggplot(data = soloblack) +
geom_point(aes(x = exp, y = log_wage)) +
stat_smooth(aes(y = log_wage, x = exp), formula = y ~ yblack)
This is only the first attempt for the data filtered with for race == "black". I am not clear how the formula should look like because through the quad_fit function it seems it already does the calculations for you.