This question focus on pandas own functions. There are still solutions (pandas DataFrame: replace nan values with average of columns) but with own written functions.
In SPSS there is function MEAN.n
which gives you the mean value of list of numbers only when n
elements of that list are valid (not pandas.NA
). With that function you are able to imputat missing values only if a minimum number of items are valid.
Are there pandas function to do this with?
Example
Values [1, 2, 3, 4, NA]
.
Mean of the valid values is 2.5
.
The resulting list should be [1, 2, 3, 4, 2.5]
.
Assume the rule that in a 5 item list 3 should have valid values for imputation. Otherwise the result is NA.
Values [1, 2, NA, NA, NA]
.
Mean of the valid values is 1.5
but it does not matter.
The resulting list should not be changed [1, 2, NA, NA, NA]
because imputation is not allowed.