I'm trying to extract outliers from my dataset and tag them accordingly.
Sample Data
Doctor Name Hospital Assigned Region Claims Illness Claimed
1 Albert Some hospital Center R-1 20 Sepsis
2 Simon Another hospital Center R-2 21 Pneumonia
3 Alvin ... ... ... ...
4 Robert
5 Benedict
6 Cruz
So I'm trying to group every Doctor
that Claimed
a certain Illness
in a certain Region
and trying to find outliers among them.
Doctor Name Hospital Assigned Region Claims Illness Claimed is_outlier
1 Albert Some hospital Center R-1 20 Sepsis 1
2 Simon Another hospital Center R-2 21 Pneumonia 0
3 Alvin ... ... ... ...
4 Robert
5 Benedict
6 Cruz
I can do this in Power BI. But I can't seem to do it in R. I'm guessing that group_by()
function of dplyr
is involved. But I'm not sure.
This is what I'm trying to achieve:
Algo goes like:
Read data
Group data by Illness
Group by Region
get IQR based on Claims Count
if claims count > than (Q3 + 1.5) * IQR
then tag it as outlier = 1
else
not an outlier = 0
Export data
I have done this before but this code loops through each Illnesses and applies Linear Regression for each. Is this anywhere near to what I'm trying to achieve?
# Loop through the dataframe and apply model
Ind <- sapply(split(df, list(df$Region,df$Illness_Code)), function(x)nrow(x)>1)
out <- lapply(
split(df, list(df$Region, df$Illness_Code))[Ind],
function(c){
m <- lm(formula = COUNT ~ YEAR, data = c)
coef(m)
})
Any ideas?