In my dataset, there are some features which are not always present:
HW_GRADE: range 0-100 HW_RESUBMISSION: If present, 0-100
In other words, if the student did not resubmit then that feature is absent. As far as I can tell scikit learn doesn't like NaN or blank features. Using interpolation to force a value into that feature doesn't makes sense. I could also create a binary variable 'HW_RESUBMITTED' which would be 0 if HW_RESUBMISSION is NaN. But the actual value, when present, is also a useful discriminator.
The referenced possible duplicate states that missing values are a problem. I agree. In fact, my question is asking for the right way to deal a scenario where interpolation would lead to the wrong results, and simply setting the missing values to a fixed '0' would also lead to incorrect reuse.ts. I propose a possible way to handle this and am looking for someone more advanced than me to comment.