inverse normal transformation for gwas dataset

Question

I have a gene count datasets that look like:

I am trying to do inverse normal transformation on this dataset but the results are coming like this:

I want to do it across samples. so basically want to get an output that looks like sample as column and gene as row and INT applied across sample for each gene.

inormal <- function(x)
{
    data <<- (qnorm((rank(x, na.last = "keep") - 0.5) / sum(!is.na(x))))
}

data:

structure(list(GTEX.1117F = c(4.85944, 1.67961, 0.29352, 0.09784,  4.25609, 0.99472), GTEX.1128S = c(1.54004, 0.55209, 0.31963,  0.72643, 4.09708, 0.37775)), row.names = c("ENSG00000227232.5",  "ENSG00000268903.1", "ENSG00000269981.1", "ENSG00000241860.6",  "ENSG00000279457.4", "ENSG00000228463.9"), class = "data.frame")

It would be easier to help if you could provide a reproducible data set using `dput()` along with the intended outcome. Offhand, I don't understand why you are creating a function and using `<<-` - this is usually a bad idea, unless you have a very specific reason (see [here](https://stackoverflow.com/a/2630222/5221626)). Why not just the following? `inormal <- function(x) { qnorm((rank(x, na.last = "keep") - 0.5) / sum(!is.na(x))) }` and then run some `dplyr` code: `mutate(mydf, across(everything(), inormal))` — Phil, Oct 11 '22 at 05:25
okay, i will remove <<- this. i don't want to mutate my df. `dput(head(gene_exp[, c(1, 3)])) structure(list(GTEX.1117F = c(4.85944, 1.67961, 0.29352, 0.09784, 4.25609, 0.99472), GTEX.1128S = c(1.54004, 0.55209, 0.31963, 0.72643, 4.09708, 0.37775)), row.names = c("ENSG00000227232.5", "ENSG00000268903.1", "ENSG00000269981.1", "ENSG00000241860.6", "ENSG00000279457.4", "ENSG00000228463.9"), class = "data.frame")` — Katherin Wright, Oct 11 '22 at 15:21
"i don't want to mutate my df." - Then you're at an impasse. How could you look to transform your data without making changes to it. — Phil, Oct 11 '22 at 15:43
i just want to create a new df with inverse normalized transformation data — Katherin Wright, Oct 11 '22 at 17:37
then assign it to a new object: `mydf2 <- mutate(mydf, across(everything(), inormal))` — Phil, Oct 11 '22 at 17:51
but i want to do it across samples for each gene. and when i run the above code, it generate an output for each sample. how can i modify it to do it across sample for each gene — Katherin Wright, Oct 11 '22 at 17:55
I'd need a better understanding of how you want your final outcome to look like. You only provided your starting data frame so far. I'm not a biogeneticist, I don't know whether the samples/genes are referring to cases or variables in the data frame. — Phil, Oct 11 '22 at 18:19
Both samples and genes are variables??? Again, I can't help any further without an indication as to what the outcome would look like. — Phil, Oct 11 '22 at 20:41

inverse normal transformation for gwas dataset

0 Answers0