0

looking for answers similar to these posts; R: Replace multiple values in multiple columns of dataframes with NA ; Multiple replacement in R

My dataframe my.df contains NAs.

dput(my.df)
structure(list(`AICAR (GDSC1:1001)_GDSC1` = c(10.1253052794007, 
NA, NA, NA, NA, NA, 9.3362273693641, NA, NA, NA), `vinblastine (GDSC1:1004)_GDSC1` = c(-5.56689193211021, 
NA, NA, NA, NA, NA, -3.49808657768651, NA, NA, -5.7323006155361
), `cisplatin (GDSC1:1005)_GDSC1` = c(3.20680858158152, NA, NA, 
NA, NA, NA, NA, NA, NA, NA), `cytarabine (GDSC1:1006)_GDSC1` = c(-1.29089026889862, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), `docetaxel (GDSC1:1007)_GDSC1` = c(-9.21190331946225, 
NA, NA, NA, NA, NA, NA, NA, NA, -6.51430196744496), `methotrexate (GDSC1:1008)_GDSC1` = c(NA, 
NA, NA, NA, NA, NA, -4.96153980941858, NA, NA, NA), `gefitinib (GDSC1:1010)_GDSC1` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, -4.65609368323825), `navitoclax (GDSC1:1011)_GDSC1` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), `vorinostat (GDSC1:1012)_GDSC1` = c(-0.1834250603902, 
1.80666265545084, 0.503152683902549, 1.78569632218743, NA, 1.01934567070847, 
0.321867836558935, NA, 2.18003424956055, 0.143794452798708)), row.names = c(NA, 
10L), class = "data.frame")

I get the cell location of each NA using idx <- my.df %>% lapply(., function(x) which(is.na(x))) Convert these NAs to 0 by my.df %>% mutate_if(.,is.numeric, ~replace(., is.na(.), 0)) before I calculate correlations. Now how can I return the NAs into their dedicated cells based on theidx?

I recon loops, tidy, purrr or something similar can do this fast? Would be great if a match could be done between the column names of my.df and the names of idx for quality control in the code.

Thanks!

Seymoo
  • 177
  • 2
  • 15
  • 2
    do not use `lapply`. use `idx <- is.na(my.df)` Then do whatever you want to `my.df` Once done you can do `is.na(my.df) <-idx` – Onyambu Nov 24 '22 at 10:55
  • 1
    or... use `cor(..., type = 'pairwise.complete.obs')` to calculate the correlation and leave your data unaffected? Your correlation is likely biased after imputation, unless you have pre-knowledge that NA's are in fact 0's in which case they can should be replaced and not returned. – Oliver Nov 24 '22 at 11:02
  • Thanks for the suggestion. If I do the first alternative I get `Error in `[<-.data.frame`(`*tmp*`, value, value = NA) : unsupported matrix index in replacement` – Seymoo Nov 24 '22 at 12:51
  • 1
    Could you give an example of what is the expected output of 'how can I return the NAs into their dedicated cells based on their idx' – Waldi Nov 26 '22 at 10:27
  • if you consider the example above, I turns NAs in the cells to 0, I do some stuff with the data but the location of cells is still the same. so now I want to insert NAs back to their original cells. – Seymoo Nov 30 '22 at 09:35

0 Answers0