0

I have the following function:

    library(dplyr)

apply_fun <- function(data) {
  
  data %>%
    group_by(Type) %>%
    summarise(across(starts_with('x'), list(median = median, 
                                            first_quartile = ~quantile(., 0.25), 
                                            third_quartile = ~quantile(., 0.75))))
}

It gives me the median and first and third quartiles of a data set for each column, for each 'Type', for data sets structured as so:

    Type    x1        x2      x3  ...
1:  type1   1.54    1.48    1.88    
2:  type2   1.46    1.99    1.48
3:  type1   2.01    1.02    1.03
...

The function produces data like:

    x1_median   x1_first_quartile   x1_third_quartile   x2_first...
type1   1.505       1.122           ...
type2   1.488       1.026           ...
... ...

I have other datasets structured in the same way. I want to include a plot in the function, of the medians and quartiles of each type against the x values. The x values being the numbers in the column names which are not necessarily beginning at 1. A plot similar to:

enter image description here

I made this graph for a specific case.

    plot(some_vector, unlist(FactorMedians[1, 2500]), type = "l", las = "1",
     main = "Median values by Factor")
lines(some_vector, unlist(FactorMedians[2, 2500]), type = "l")
lines(some_vector, unlist(FactorMedians[3, 2500]), type = "l")
lines(some_vector, unlist(FactorMedians[4, 2500]), type = "l")
lines(some_vector, unlist(FactorMedians[5, 2500]), type = "l")  

I cannot figure out how to find a general form for this.

Factor medians was calculated with this:

library(dplyr)
    FactorMedians = mydata %>%
      group_by(Type) %>%
      summarise(across(starts_with('x'),
      median, probs = quant0, na.rm = TRUE))

If what I want is not clear maybe look at my previous question How can I create a function that computes the median and quartiles for each column of data, for each factor of data?

  • Can you please provide a reproducible example? your data and the script of the plot? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. You can share your data using dput(yourdataframe) or you can use one of the built-in datasets `library(help = "datasets")` – shiny Nov 12 '20 at 20:26
  • @shiny I have added some extra information since you commented maybe that helps – David Edwards Nov 12 '20 at 21:03

1 Answers1

1

You mean something like this ?

# make dummy data
x <- 1:20
y <- data.frame(a=rnorm(20), b=rnorm(20), c=rnorm(20))



# prepare plot area
plot(NULL, xlim = range(x), ylim = range(y), xlab = "X", ylab = "Y")

# sapply ~= 'foreach', seq_len = sequence from 1 to n
sapply(seq_len(ncol(y)), function(i){
  lines(x, y[,i], type = "l", col = i)
})

yields

enter image description here

RoB
  • 1,833
  • 11
  • 23
  • Yes a graph like this as part of my function – David Edwards Nov 13 '20 at 13:39
  • It seems I must define x. I dont know how to do this in this case. That code gives me the error, Error in plot.default(NULL, xlim = range(x), ylim = range(y), xlab = "X", : object 'x' not found – David Edwards Nov 13 '20 at 15:08
  • You must replace `x` with whatever you have as your horizontal axis values. In my code, the `x` is equal to your `some_vector` variable – RoB Nov 13 '20 at 15:31
  • I need this except for an x that changes fir each dataset – David Edwards Nov 13 '20 at 15:32
  • Then you must store the x's in a list and get the appropriate one with `my_list[[i]]` – RoB Nov 13 '20 at 15:37
  • I want it to be more general than that so that one does not have to make a list for each data set. So in my function I need to create a vector ranging from the first number that appears in the column names to the last number that appears in the column names. An example of the column names might be "x22 , x24, ...x122" – David Edwards Nov 13 '20 at 15:41
  • I don't understand your issue ? If the function inputs a dataset, surely it can input the corresponding x values ? – RoB Nov 13 '20 at 15:46