1

I'm experimenting with the quantile function in independent dataframes.

A very easy example to illustrate my case:

get quartiles

quantile(x <- rnorm(1001))

0%          25%          50%          75%         100% 
-2.930587810 -0.687108751  0.004405246  0.644589258  2.839597566 

#subdivide quantile results in 5 independent results (data frames) For example:

list2env(setNames(as.list(quantile(x <-   rnorm(1001))),paste0("Q",1:5)),.GlobalEnv)

So now, in a new column I have next to the quartile data results, grouped into its corresponding quartile number Q0,Q1,Q2,Q3,Q4.

Now I'd like to apply the same to a "Large list" (large_list) with more than 400 elements on it, so I guess I need a different approach on it (function), to apply it globally into the 400 elements of my list.

Here I'd need the help of community, this is my approach:

#Read all elements of the list in the environment,create a new column to be named, 
# Elementname.Quartilenumber that contains which 
# Q (0,1,2,3,4) number the data belongs to.

Qnumber <- function(x) {
element_name <- stringi::stri_extract(names(x)[1], regex = "^[A-Z]+")
element_name <- paste0(element_name, ".Quartilenumber")
column_names <- c(names(x), stock_name)
x$quartile <- quantile(large_list$.)
x <- setNames(x, column_names)
return(x) 

Any help will be very appreciated.

Thank you very much.

Community
  • 1
  • 1
Rick
  • 171
  • 1
  • 13
  • I guess you are looking for `cut(x <- rnorm(1001), qx <- quantile(x), labels = names(qx)[-1])` (?) and would be better off forgetting list2env exists. – Frank Sep 21 '18 at 15:21
  • Thanks Frank, Imagine that you have a large_list of 400 elements loaded into your environment called "large_list" and each of this 400 elements have a column on it called "element_data", my point is to calculate which Quartile belongs each value of "element_data", Q1,Q2,Q3,Q4... in a new column that should be called "Element.Quartilenumber". The point is to create 400 columns (one for each element of the list) with Q numbers with a function . Any approach? Sorry if I've explained not very well before. :-) – Rick Sep 21 '18 at 15:41
  • Hm, I think if element means vector, with all of them having the same length, then convert the list to a data.frame and add more columns like `DF[, paste0(element_names, ".Quartilenumber")] <- lapply(DF[, element_names], f)` where f is the transformation you want to make. It's hard to be more specific without a concrete example. Some guidance on that here, if you're interested: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Sep 21 '18 at 15:50
  • Thanks Frank. I have created a new example, maybe you have time and are so kind to have a look at it and give me your thoughts. https://stackoverflow.com/questions/52466994/function-to-automatically-create-vector-in-a-large-list-for-each-element-of-the – Rick Sep 23 '18 at 14:18

1 Answers1

1

For each element in your list, do the following:

  1. calculate the quantiles, as you have done: qx <- quantiles(x)

  2. count how many of these values are >= each datum sum(qx >= x[i]); this corresponds to the quartile number in all but one case—the maximum value (you get NA for this one, because the sum is 0)

  3. set the quartile for the maximum value's quartile to the 4th quartile ('Q4').

Here are some fake data (a list of data frames):

list.1 <- list()
for (i in 1:5) {
    list.1[[i]] <- data.frame('elem_data'=rnorm(10))
}

Step through the list of data.frames and add the quartile column.

qnames <- c('Q1','Q2','Q3','Q4')
for (i in 1:5) {
    qx <- quantile(list.1[[i]]$elem_data)
    list.1[[i]]$qnum <- sapply(list.1[[i]]$elem_data, function(x) qnames[sum(x >= qx)])
    list.1[[i]]$qnum[is.na(list.1[[i]]$qnum)] <- qnames[4]
}

I tried this with a list of 1000 data.frames with 1000 data elements each, and it took about 2.5 seconds (on a mid-2013 MacBook Air).

Edward Carney
  • 1,372
  • 9
  • 7
  • 1
    Thanks Edward, I have created a new detailed example here, maybe you are so kind to have a look and give me your comments. https://stackoverflow.com/questions/52466994/function-to-automatically-create-vector-in-a-large-list-for-each-element-of-the – Rick Sep 23 '18 at 14:17