I have a large data file (LMTESTData) that contains internal data and the results of an external assessment. Rather than manually subset, I have tried a number of variants on By and ddply to run a linear regression without success.
colnames(LMTESTData)
[1] "StudentNumber" "SubjectCode" "SubjectName" "ExamMark" "AssessmentMark" "U" "hmkk"
[8] "TESmk" "Year"
The regression model is lm(hmkk ~ ExamMark + AssessmentMark)
for each SubjectCode .
Once the model is working, my next challenge will be to predict hmkk given SubjectCode, ExamMark and AssessmentMark for each StudentNumber.
Dummy Data Set
LMTESTData = data.frame(StudentNumber = 1:100, SubjectCode = c("A","B","C","D","E"),hmkk=rnorm(mean=72, 100),
ExamMark=rnorm(mean=62, 100),AssessmentMark=rnorm(mean=68, 100))