OK so admittedly this is related to another question here but there has been no response and I suspect it is because I have made it too complex. So Im asking this question which is different because it is simplified. Happy to be scolded if this is not acceptable.
My core problem is that I want to create a dataframe by including outliers only from each column. The dataframe looks like:
chr leftPos TBGGT 12_try 324Gtt AMN2
1 24352 34 43 19 43
1 53534 2 1 -1 -9
2 34 -15 7 -9 -18
3 3443 -100 -4 4 -9
3 3445 -100 -1 6 -1
3 3667 5 -5 9 5
3 7882 -8 -9 1 3
I would like to calculate the upper and lower limit of each column (from the third onwards), exclude all rows that fall within the limits so I only keep outliers, and then end up with a dataframe as follows (for each column). This dataframe then gets passed to the next bit of the code (in the loop) but I wont elaborate on this for the sake of simplicity
chr leftPos TBGGT
2 34 -15
3 3443 -100
3 3445 -100
My code so far:
alpha= 1.5
f1 <- function(df, ZCol){
# Determine the UL and LL and then generate the Zoutliers
UL = median(ZCol, na.rm = TRUE) + alpha*IQR(ZCol, na.rm = TRUE)
LL = median(ZCol, na.rm = TRUE) - alpha*IQR(ZCol, na.rm = TRUE)
Zoutliers <- which(ZCol > UL | ZCol < LL)}
but this just gives me the outlier values without the chr and leftPos it is associated with. How do I get this?