1

I have a dataset containing 120 observations of 6 variables. Five variables are factors, 1 variable is my target variable. I need to write a function that will creates a matrix (for each factor) which contains each level of the factor as columns, and the maximum value of the target variable as first row, and the minimum value of the target variable as the second row.

I know how to create a matrix, however I am lost when I need to make it through a function. Is there someone who can help?

Here is a simple example of what I want to reach with a fictive easy dataset. Example

As you can see, for each level of the factor (on the picture factor 1), I want to indicate the highest value of the target, and the lowest value of the target.

Here is a subset of my own data:

 > dput(data_plu[1:4, ])
    structure(list(NaNO3 = structure(c(2L, 8L, 8L, 3L), .Label = c("10", 
    "14", "18", "2", "22", "26", "30", "6"), class = "factor"), 
CaCl2 = structure(c(4L, 
    8L, 8L, 8L), .Label = c("0.1", "0.28", "0.46", "0.64", "0.82", 
    "1", "1.19", "1.37"), class = "factor"), PO4 = structure(c(1L, 
    5L, 5L, 6L), .Label = c("0.1", "0.8", "1.5", "2.2", "2.9", "3.6", 
    "4.3", "5"), class = "factor"), NH4Cl = structure(c(5L, 3L, 3L, 
    6L), .Label = c("0.5", "10.86", "12.93", "15", "2.58", "4.65", 
    "6.72", "8.79"), class = "factor"), MgSO4 = structure(c(4L, 7L, 
    1L, 7L), .Label = c("0.21", "0.35", "0.5", "0.64", "0.79", "0.93", 
    "1.08", "1.22"), class = "factor"), DC = c(15000L, 707500L, 720000L, 
    872500L)), row.names = c(NA, 4L), class = "data.frame")
Sarah_Data
  • 13
  • 3
  • 1
    Please share a little bit of sample data. I think the `model.matrix` function will get you most of the way there---[see this question or others about one-hot encoding](https://stackoverflow.com/a/4561534/903061)---but I'm a bit confused when you talk about wanting two rows, one with the max and one with the min. Don't you have 120 observations? – Gregor Thomas Nov 05 '19 at 14:35
  • Sorry if my question is not clear. I added a Picture to my question to show what I mean with a simpler example. – Sarah_Data Nov 05 '19 at 14:45
  • Hi Sarah, ideally not a picture.. You mentioned you know now to create a matrix, is it possible if you use that to create a small example of your data? – StupidWolf Nov 05 '19 at 14:47
  • Ah - disregard my comments about `model.matrix` and one-hot encoding, I completely misunderstood. Please [see the FAQ](https://stackoverflow.com/q/5963269/903061) about providing reproducible examples. `dput()` is a nice function to create a copy/pasteable version of data, for example `dput(your_data[1:4, ])` will give us the first 4 rows of your data. – Gregor Thomas Nov 05 '19 at 14:50
  • I added a subset of the data to my question :) – Sarah_Data Nov 05 '19 at 15:11
  • Thanks for the example data. I see columns `NaNO3 CaCl2 PO4 NH4Cl MgSO4 DC`, but most of them seem to be decimals, though they *all* are of class `factor`. Which one is the target, and which one do you want to consider as factors? – Gregor Thomas Nov 05 '19 at 16:48
  • DC is the target, all others are the "factors". They were first numerical variables, I transformed them to characters first, then to factors. – Sarah_Data Nov 06 '19 at 07:01

1 Answers1

1

You may be able to modify this to meet your needs. I wrote a function to handle one factor and then use lapply to handle them all. I've called your sample data dta:

stats <- function(x, y) {
    minmax <- aggregate(y, list(x), range)
    cols <- minmax[, 1]
    result <- as.matrix(t(minmax[, -1]))
    dimnames(result) <- list(c("Min", "Max"), Levels=as.character(cols))
    return(result)
}
out <- lapply(dta[, -6], function(x) stats(x, dta$DC))
head(out, 1)
# $NaNO3
#      Levels
#          14     18      6
#   Min 15000 872500 707500
#   Max 15000 872500 720000
dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • Thank you! I tried something else, first. Seems a little detail is not working yet. But I will look at your solution and try to find my error that way! – Sarah_Data Nov 06 '19 at 07:01