0

I have code to compute this for one data set

    quant0 = c(0.5)
Median = apply(data1[2:1000], 2, median, probs = quant0, na.rm = TRUE )

quant1 = c(0.25)
firstQuartiles = apply( data1[2:1000] , 2 , quantile , probs = quant1 , na.rm = TRUE ) 

quant2 = c(0.75)
thirdQuartiles =  apply( data1[2:1000] , 2 , quantile , probs = quant2 , na.rm = TRUE )

I have multiple datasets in the same format of the one I used for the code above. This is what all the data frames look like:

          Type    x1  x2  x3  ...
1:  type1   1.54    1.48    1.88    
2:  type2   1.46    1.99    1.48
3:  type1   2.01    1.02    1.03
...

I am a novice at writing functions. The other data sets I need to apply this function to are in the exact same format as I have shown above. The only thing that will change is the number of columns. Edit: I did not explain correctly, I want to use a function to compute the median, First quartile and third quartile for each column, for each type.

This is the code I used to do what I specified in the Edit above:

    library(dplyr)
FactorMedians = data1 %>%
  group_by(Type) %>%
  summarise(across(starts_with('x'), median, probs = quant0, na.rm = TRUE))

I need to change this into a function I can use with other similar datasets

2 Answers2

3

Here is an interesting tidyverse solution. The summarize function returns multiple rows if the summarizing function has multiple outputs. Then we can name those rows.

library(dplyr)
library(tibble)
iris %>%
  summarise(across(where(is.numeric), 
                   function(x) quantile(x, 
                                        probs = c(0.25, 0.5, 0.75),
                                        na.rm = TRUE))) %>%
  mutate(id = c("first quartile", "median", "third quartile")) %>%
  column_to_rownames("id")

               Sepal.Length Sepal.Width Petal.Length Petal.Width
first quartile          5.1         2.8         1.60         0.3
median                  5.8         3.0         4.35         1.3
third quartile          6.4         3.3         5.10         1.8
Ben Norris
  • 5,639
  • 2
  • 6
  • 15
2

You can write the function like this :

library(dplyr)

apply_fun <- function(data) {

data %>%
  group_by(Type) %>%
  summarise(across(starts_with('x'), list(med = median, 
                                          first_quartile = ~quantile(., 0.25), 
                                          second_quartile = ~quantile(., 0.5),
                                          third_quartile = ~quantile(., 0.75))))
}
result <- apply_fun(data1)

You can add/remove functions in the list as per requirement.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213