I have a dataframe (df) with a categorical variable (CHR, 22 levels) and a continuos variable (POS, for chromosomal position, varying among CHR levels). I want to generate an additional categorical variable based on ranges for POS, which has to be generated based on POS values for each CHR level, and the range is equally sized, for example, let's suppouse this is the df:
CHR POS
1 2
1 4
1 6
. .
. .
1 30
. .
. .
. .
22 150
22 162
22 170
22 185
So I tried to split first the df by using:
> df_split <- split(df, f=df$CHR)
# then I generate a function, involving "cut" function
> bins <- function(df){
lower <- min(df$POS)
upper <- max(df$POS)
cut(df$POS, seq(lower,upper, 10))
}
# finally i used lapply, incorporating my personalizad "cut" function
> bin_1 <- lapply(df_split, bins)
The problem is that cut function is not working
Thanks for any help!