function over more than one list

Question

I do not use that many functions but when I do I tend to use an anon function and some form of apply . I now however am trying to write a function that works over items in a list.

There are two lists that each have many items (by item I mean e.g. mylist1[1]). All items are dataframes. I want to take the first dataframe from mylist1 and the first dataframe from mylist2 and run a bunch of functions over the columns in those dataframes. Then take the 2nd mylist1 item and the 2nd mylist2 item and so on...

Below is the sort of thing I am used to writing but clearly does not work in this case with two lists. Can anyone help me out with a fast way to figure out how I should approach this using something other than sapply method that seems to be causing the main problem.

a <- c(1:10)
b <- c(1:10)
z <- c(rep("x", 5), rep("y", 5))
df <- data.frame(cbind(a, b, z))
mylist1 <- split(df, z)
mylist2 <- split(df, z)

myfunction <- function(x, y) 
{

    a <- as.data.frame(x[1])
    b <- as.data.frame(y[1])
    meana <- mean(a[1])
    meanb <- mean(b[1])
    model <- lm(a[1]~b[1])
    return(c(model$coefficients[2], meana, meanb))
}

result <- sapply(mylist1, mylist2, myfunction)

I also just thought do people think it would be better to subset by z rather than split and do the function that way?

What's the difference between `mylist` and `mylist2`? At the moment, you have both as `split(df, z)`. — A5C1D2H2I1M1N2O1R2T1, Jun 15 '12 at 13:23
in reality they are lists with the same number of elements but the size of the data frames within are different. different number of columns and rows. I just doubled `mylist` here for (I thought) simplicity. — user1322296, Jun 15 '12 at 13:28

score 5 · Accepted Answer · answered Jun 15 '12 at 13:16

5

You are describing exactly the use case for mapply.

result <- mapply(myfunction,x=mylist,y=mylist2)

Unfortunately your example doesn't seem to enjoy being passed two data.frames (x, y 's first elements are both data.frames, which x[1] and y[1] would seem to contradict).

answered Jun 15 '12 at 13:16

Ari B. Friedman

71,271
35
175
235

thank you @gsk I had a closer look at mapply and it seems to be the right one. I see the problem now with the `x[1]` etc. can you suggest an improvement? – user1322296 Jun 15 '12 at 13:34
@user1322296, I see a few other problems that you need to fix. Your `data.frame(cbind(...` code converts your numeric values to factors. Just use `data.frame` without `cbind`. Some lines in your function can also be reduced. For example you can directly use `meana <- mean(x[[1]][, 1])` (that is, the first column as a vector from the first item in the list) instead of first creating `a`. I *think* also that for your `model`, you might also need to use the same `[[1]][, 1]` structure. Can you post what you expect as the result of this example? – A5C1D2H2I1M1N2O1R2T1 Jun 15 '12 at 16:24
thank you gsk I played around more and I get something pretty much what I was asking for but changed all my notation. I'll try and get to grips and see if i can improve it more. – user1322296 Jun 16 '12 at 09:18
1

@user1322296 To debug `*apply` functions, I find it simpler to create a test dataset that represents the data from a single iteration. Write your function based on those, then run the apply command. For instance, set `x <- mylist[[1]]` and `y <- mylist2[[1]]`, and write your `myfunction` based on those. You'll quickly see where the errors are coming from. The other way do to this is to make the first line of `myfunction` a call to `browser()` and then run the `mapply` command. – Ari B. Friedman Jun 16 '12 at 12:01

function over more than one list

1 Answers1

Linked