0

I would like to transform my data but I am not quite sure which method is the best one. Thus, I use the package "bestNormalize".

It works fine on one single column of a dataframe, however I have have list with two dataframes (each dataframe with 9 columns) and I would like to apply the function "bestNormalize" to each column. I tried to map but it does not work.

Further, I would like to apply other functions (transformation of the data, e.g. with the function "yeojohnson") of the package in the same how I applied the "bestNormalize" function to each column of each dataframe.

Does anybody know how this works? Thanks in advance.


install.packages("bestNormalize")
library(bestNormalize)

install.packages("purrr")
library(purrr)

# Data
a <- data.frame(
  met1 = rnorm(n = 100, mean = 0, sd = 1),
  met2 = rnorm(n = 100, mean = 0, sd = 1),
  met3 = rnorm(n = 100, mean = 0, sd = 1),
  met4 = rnorm(n = 100, mean = 0, sd = 1),
  met5 = rnorm(n = 100, mean = 0, sd = 1),
  met6 = rnorm(n = 100, mean = 0, sd = 1),
  met7 = rnorm(n = 100, mean = 0, sd = 1),
  met8 = rnorm(n = 100, mean = 0, sd = 1),
  met9 = rnorm(n = 100, mean = 0, sd = 1)
)


y <- data.frame(
  met1 = rnorm(n = 100, mean = 0, sd = 1),
  met2 = rnorm(n = 100, mean = 0, sd = 1),
  met3 = rnorm(n = 100, mean = 0, sd = 1),
  met4 = rnorm(n = 100, mean = 0, sd = 1),
  met5 = rnorm(n = 100, mean = 0, sd = 1),
  met6 = rnorm(n = 100, mean = 0, sd = 1),
  met7 = rnorm(n = 100, mean = 0, sd = 1),
  met8 = rnorm(n = 100, mean = 0, sd = 1),
  met9 = rnorm(n = 100, mean = 0, sd = 1)
)


my_list <- list(a, y)
 
# Works:
bestNormalize::bestNormalize(my_list[[1]]$met1)

# Does not work:
stand_dat_men <- my_list  %>% purrr::map(~mutate_at(.x, .vars = vars(met1:met9), ~bestNormalize(.)))





Jzlia10
  • 49
  • 1
  • 7

1 Answers1

1

bestNormalize returns an object of class "bestNormalize", you can store it in a list. Also instead of mutate you can use summarise here.

library(dplyr) 
library(bestNormalize)

output <- purrr::map(my_list, ~.x %>% summarise_at(vars(met1:met9), 
                               ~list(bestNormalize(.))))

summarise_at has been replaced with across now.

output <- purrr::map(my_list, ~.x %>% summarise(across(met1:met9, 
                               ~list(bestNormalize(.)))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks. It works but the problem is the output. The reason why I do this is that I would like to conduct a regression after the transformation. My real list contains 10 dataframes with further (not transformed) variables and I would like to have a list as output with 10 dataframes. Now I have a list with 10 dataframes and in each dataframe I have a list... How can I extract "unlist" the output? as.data.frame does not work. Note: instead of bestNormalize I used the "yeojohnson" function of the package. – Jzlia10 Jul 17 '20 at 07:58
  • @Jzlia10 I have answered the question about `bestNormalize` function. What output do you want from `bestNormalize::bestNormalize(my_list[[1]]$met1)` ? – Ronak Shah Jul 17 '20 at 10:39
  • I want the variable "x.t" in a dataframe. For this I created a function: function(x) {list <- yeojohnson(x) x.t <- list$x.t return(x.t) }. It worked. :) Thanks again. – Jzlia10 Jul 17 '20 at 13:27
  • At first I used the bestNormalize function. Based on the results I applied the yeojohnson to do the transformation of the data. In the end, it is what I need. I though it is the same "procedure" for both functions. – Jzlia10 Jul 17 '20 at 13:35