1

I have this summary dataframe (from this question):

lst <- lapply(1:ncol(mtcars), function(i){
  x <- mtcars[[i]]
  data.frame(
    Variable_name = colnames(mtcars)[[i]],
    sum_unique = NROW(unique(x)), 
    NA_count = sum(is.na(x)), 
    NA_percent = round(sum(is.na(x))/NROW(x),2))  
  })
do.call(rbind, lst)

Where I want to add the five highest and lowest values, for each column:

lst <- lapply(1:ncol(mtcars), function(i){
  x <- mtcars[[i]]
  data.frame(
    variable_name = colnames(mtcars)[[i]],
    distinct = NROW(unique(x)), 
    NA_count = sum(is.na(x)), 
    NA_percent = round(sum(is.na(x))/NROW(x),2),
    first_5 = paste0(sort(x, decreasing=TRUE)[1:5],";"),
    last_5 = paste0(sort(x)[1:5],";")
  )   
})
do.call(rbind, lst)

But it creates a new row for each first_5 and last_5 values. Why happens this? And how can I solve it?

Chris
  • 2,019
  • 5
  • 22
  • 67
  • Hi, could you be more precise on the output you want ? You want to add the five highest and lowest values for each column, what does that mean ? 10 more rows ; 5 highest value rows and 5 lowest value rows ? – Félix Cuneo Oct 22 '19 at 06:45
  • And you want the max values of in values of what ? mpg I would say ? – Félix Cuneo Oct 22 '19 at 07:03
  • 1
    if you want the for 5 highest/lowest value on the same line, use the parameter `collapse=";"` as in : `paste0(sort(x)[1:5], collapse=";")` – Etienne Kintzler Oct 22 '19 at 07:45

1 Answers1

0

You are almost there. Because you have five number for one spot, paste0 itself cannot do the job. One solution is add to toString like this:

lst <- lapply(1:ncol(mtcars), function(i){
  x <- mtcars[[i]]
  data.frame(
    variable_name = colnames(mtcars)[[i]],
    distinct = NROW(unique(x)), 
    NA_count = sum(is.na(x)), 
    NA_percent = round(sum(is.na(x))/NROW(x),2),
    first_5 = paste0(toString(sort(x, decreasing=TRUE)[1:5]),";"),
    last_5 = paste0(toString(sort(x)[1:5]),";")
  )   
})
do.call(rbind, lst)
Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
  • It worked! But can you explain the process? I still don't understand why it doesn't work without toString(). – Chris Oct 22 '19 at 14:40