In dfs containing results of differentially expressed proteins, I would like to mark which proteins exceed certain thresholds of significance (eg logFC>1 & p<0.05 as up_0.05 or p<0.01 as up_0.01). Using ifelse I can do this for each df individually, but it would be much cleaner to have a function as I have many dfs to process this way.
A similar question has been asked (dplyr - mutate: use dynamic variable names) but I was not able to translate this into solving my problem, so I would appreciate it very much if you could correct my functions code to work (example data provided)
Thanks a lot!
sample data
p.vals <- seq(from=0, to=1, by=.0001)
logFCs <- seq(from=0, to=4, by=.1)
diffEx_proteins <- data.frame(protein=LETTERS[1:1000],
adj.P.Val=sample(p.vals, size=1000, replace=TRUE),
logFC=sample(logFCs, size=1000, replace=TRUE))
function
mark_significants <- function(comparison){
comparison$paste0(comparison, "up_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1, TRUE, FALSE)
comparison$paste0(comparison, "up_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1, TRUE, FALSE)
}
usage
mark_significants(diffEx_proteins)
I get the error "Error in mark_significants(diffEx_proteins) : invalid function in complex assignment"
I would like to get the df with 4 added logical columns, indicating wether proteins reach the defined threshold levels.