Applying functions to dataframe or multiple lists

Question

Edit as Per the comments: The OP would like to calculate:

(100 *  (1 - 10 ^ - (Do - Do[Do==0] )) ⎞ (1 - 10 ^ - (Do[Do==100] - Do[Do==0]) - Do

For each combination of Cl, In, Sa in the data.frame
-RS

I am trying to apply a function, called dG, to a dataframe. Since the function's arguments length differ recycling produced unpredictable results.

To rectify this issue I separated the dataframe into lists and tried to apply the dG function (below) to each list after identifing each list with a function called 'ids'.

Please feel free to suggest a different solution. FYI, my specific requests start with bullet points

Please let me start by providing synthetic data that shows the issues:

Do <- rep(c(0,2,4,6,8,10,15,20,30,40,45,50,55,60,65,70,80,85,90,92,94,96,98,100), each=16,times=16)
Cl <- rep(c("K", "Y","M","C"), each= 384, times=4)
In <- rep(c("A", "S"), each=3072)
Sa <- rep(c(1,2), each=1536)
Data <- rnorm(6144)
DataFrame <- cbind.data.frame(Do,Cl,In,Sa,Data); head(DataFrame)
rm(Do,Cl,In,Sa,Data)
attach(DataFrame)

DFSplit <- split(DataFrame[ , "Data"], list(Do, Cl, In, Sa))

The function 'ids' is a helper function that identifies the lists names

ids <- function(Do, Cl, In, Sa){
    grep( paste( "^" , Do, "\\.",
                Cl, "\\.",
                In,
                "\\.", Sa,sep=""),
         names(DFSplit), value = TRUE)}

mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE)

The above mapply produces 6144 lists. If you look at the mapply output you will notice that there is 384 unique list names but each is repeated 16 times 384*16=6144.

How can I change the 'ids' function so that mapply doesn't repeat the same name 16 times.

As an ugly and highly costly solution I used unique; I need a better fundamental solution.

unique(mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE))

The dG function is the one that I want to operates on each of the 'DFSplit' lists. It has the same issue as the previous ids function. It uses the ids function as an input.

dG <- function(Do,Cl, In, Sa){
    dg <- 100*
                (1-10^-( DFSplit[[ids(Do,  Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) /
                (1-10^-( DFSplit[[ids(100, Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) - Do
    dg}

I tried to use dG as follows and it is not what I want.

dG(Do,Cl, In, Sa)

It only evaluated the LAST part of the dG function (- Do) plus this warning

In grep(paste("^", unique(Do), "\.", unique(Cl), "\.", unique(In), : argument 'pattern' has length > 1 and only the first element will be used

Can you suggest a modification to the dG function

Then I tried mapply

mapply(dG, Do, Cl, In, Sa, SIMPLIFY = FALSE)

mapply correctly evaluated the function with my data. mapply produces 6144 lists. You will notice that the mapply output is basically 384 unique lists, each repeated 16 times 384*16=6144.

How can I modify the dG function to get rid of the useless and time consuming repetition?

My thought would be:

eliminate the repetition in my first function 'ids', which I do not know how to do .
change the arguments of the second function so the arguments' lengths would be 384. Maybe use the names of the lists as an input argument. which I do not know how.
Change the formula dG and not use (Do, Cl, In, Sa) arguments since each one has a length of 6144

Your code returns `Error: object 'DFSplit' not found`. How about you drastically simplify this into something that is not 384 unique elements long, but more like 3 elements long so people can see what you're actually trying to do. — Chase, Mar 16 '13 at 18:06
@Chase, `DFSplit` is here. http://stackoverflow.com/questions/15449612/why-does-mapply-repeat-the-same-list-multiple-times OP probably still has it in the env and hence overlooking the error — Ricardo Saporta, Mar 16 '13 at 18:10
Sorry Chase, that what I get from copy/paste. It is correct now — Ragy Isaac, Mar 16 '13 at 18:10
And please describe in plain English what you want to achieve, e.g., what function `dG` is supposed to do with your data. (and don't use `attach`) — Roland, Mar 16 '13 at 18:11
Hi @RagyIsaac, My comment on the previous question may have been a little vague. My suggestion was to explain simply your `dG` function (or whatever your **main** goal is). Everything superfluous is cluttering the thought space. — Ricardo Saporta, Mar 16 '13 at 18:12
I agree with everything Ricardo suggested. I think that you are probably making this entirely too complicated. I highly doubt there is a need to split into separate lists first, there are *several* R functions to do group by operations. A subset of them can be found [here](http://stackoverflow.com/questions/10748253/idiomatic-r-code-for-partitioning-a-vector-by-an-index-and-performing-an-operati/10748470#10748470) with timing results. More importantly, I'd recommend reviewing [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and edit accordingly. — Chase, Mar 16 '13 at 18:22
Hi Roland, this is a well known function in the Graphic Arts, called the "tone value increase" Murray/Davies equation. Details can be found at: http://www.color.org/icc_white_paper5glossary.pdf page 21 — Ragy Isaac, Mar 16 '13 at 18:23
in that case Ragy, I dont think your dG function is giving you your tone value increase — Ricardo Saporta, Mar 16 '13 at 18:29
This is a great example where simply giving folks the necessary information, makes it that much easier for them to be helpful. If you can simply state what you are trying to calculate, folks will offer you a multitude of helpful suggestions of how to go about it. Please dont make people guess what it is you are doing. — Ricardo Saporta, Mar 16 '13 at 18:42

score 5 · Accepted Answer · edited May 23 '17 at 12:21

UPDATE:

The comment you made to @Roland, was all you had to put in each of your previous related questions, this once included.

The entirety of your process can be handled in one line of code:

library(data.table)
myDT <- data.table(DataFrame)

myDT[ , "TVI" :=  100 * (1 - 10^-(Data - Data[Do==0])) / (1 - 10^-(Data[Do==100] - Data[Do==0])) 
      , by=list(Cl, In, Sa)]

# this is your Tonval Value Increase
myDT$TVI

original answer:

It's stil awfully unclear what you are trying to accomplish. However, here are two concepts that should be able to save you a world of headaches.

First, you do not need your `ids` function. You can get more mileage out of `expand.grid`:

myIDs <- expand.grid(unique(Do), unique(Cl), unique(In), unique(Sa))

# You can then use something like 
apply(myIDs, 1, paste, sep=".")
# to get the same results.  Or whatever other function suits

However, even that is not neccessary.

Here is the equivalent of your `dG` function using `data.table`.

Notice there is no need for any of the splitting or ids or anything like that.
Everything is hanlded by the by function in data.table.

library(data.table)
myDT <- data.table(DataFrame)

myDT

dG_DT <- 
    100 * 
    1 - 10^(   myDT[ ,     Data, by=list(Do, Cl, In, Sa)][, Data] 
             - myDT[Do==0, Data, by=list(Do, Cl, In, Sa)][, Data]
            ) / 

    1 - 10^(   myDT[Do==100, Data, by=list(Do, Cl, In, Sa)][, Data]
             - myDT[Do==0,   Data, by=list(Do, Cl, In, Sa)][, Data]
            ) - 
    myDT[, Do]

dG_DT

Ricardo Saporta, Thank you so so much that fully answered my question. Can you explain your solution detetails? However, the formula you provide is total dot area. To get TVI we have to subtract the dot value as follows: myDT[ , "TVI" := (100 * (1 - 10^-(Data - Data[Do==0])) / (1 - 10^-(Data[Do==100] - Data[Do==0]))- Do) , by=list(Cl, In, Sa)] — Ragy Isaac, Mar 16 '13 at 21:05

Applying functions to dataframe or multiple lists

1 Answers1

UPDATE:

First, you do not need your ids function. You can get more mileage out of expand.grid:

Here is the equivalent of your dG function using data.table.

First, you do not need your `ids` function. You can get more mileage out of `expand.grid`:

Here is the equivalent of your `dG` function using `data.table`.