0

If I have a list of data frames in R, such as:

x<-c(1:10)
y<-2*x
z<-3*x
df.list <- list(data.frame(x),data.frame(y),data.frame(z))

And I'd like to average over a specific column (this is a simplified example) of all these data frames, is there any easy way to do it?

The length of the list is known but dynamic (i.e. it can change depending on run conditions).

For example:

dfone<-data.frame(c(1:10))
dftwo<-data.frame(c(11:20))
dfthree<-data.frame(c(21:30))

(Assume all the column names are val)

row, output
1,    (1+11+21)/3 = 11
2,    (2+12+22)/3 = 12
3,    (3+13+23)/3 = 13

etc

So output[i,1] = (dfone[i,1]+dftwo[i,1]+dfthree[i,1])/3

To do this in a for loop would be trivial:

for (i in 1:length(dfone))
{
  dfoutput[i,'value']=(dfone[i,'value']+dftwo[i,'value']+dfthree[i,'value'])/3
}

But I'm sure there must be a more elegant way?

Henry
  • 1,646
  • 12
  • 28
  • 1
    Which column? Each of your `data.frame`s only have one column, so you might want to create a more representative example. – nrussell Oct 19 '15 at 15:35
  • 1
    You want an overall average or per each column in the data frames? What is the desired output? – David Arenburg Oct 19 '15 at 15:36
  • Let's assume we have a column called `value` in each of the data frames. I'd only want to average one column at a time. – Henry Oct 19 '15 at 15:38
  • `lapply(df.list, function(x) mean(x$value))`? – David Arenburg Oct 19 '15 at 15:39
  • That looks good. Will give it a test run and see what I can do. – Henry Oct 19 '15 at 15:40
  • Just do a quick for loop. Something like for (i in 1:length(df.list)){data1[i]<-df.list[[i]][,yourcolumn]}. – CCurtis Oct 19 '15 at 15:42
  • 1
    The problem is that for loops are very slow (at least in what I've seen in R), so I'm trying to avoid that. – Henry Oct 19 '15 at 15:42
  • @Henry you are correct, especially when growing vectors in for-loops. – Heroka Oct 19 '15 at 15:45
  • I know I'm right - past experience! I had a lovely piece of code that ran in 5 seconds and did 90% of the work. Adding in 3 for loops (2 for aesthetic purposes!) blew that up to 35 seconds.. – Henry Oct 19 '15 at 15:46
  • @DavidArenburg I'm trying your example now. What it has done is an average of the data frames column-wise, rather than row-rise. I.e. for each data frame's first row, I want to take an average of `df[1-5]$x[1]`, rather than average `[1]=df[1]$x_average, [2]=df[2]$x_average`. Do you see what I mean? Thanks! – Henry Oct 19 '15 at 15:47
  • 1
    Your description is confusing. This is what I don't understand about these explanations. If you posted "this is what I have now", "this is what I want". There would be no confusion. Just post the output that should result from the example. – Pierre L Oct 19 '15 at 17:20
  • Maybe ```res <- unlist(lapply(df.list, `[`, "value")) ; rowMeans(matrix(res, ncol = length(df.list)))``` ? That assumes though that the data frames within the list are of same size. – David Arenburg Oct 19 '15 at 18:10
  • Example updated with sample. David - that assumption is fine, and one I'm happy to assume in the code. – Henry Oct 19 '15 at 18:23
  • @Henry Loops are not any slower unless your are indexing a lot inside of them or are creating loops within loops. – CCurtis Oct 19 '15 at 18:39
  • 1
    Try `Reduce("+", list(dfone, dftwo, dfthree))/3` where 3 is length of `list` or the number of data.frames we place in the `list`. – akrun Oct 19 '15 at 18:49
  • I thought for loops were inherently slower in R because they use R code whereas most functions are compiled C? – Henry Oct 19 '15 at 18:54
  • I don't understand, did you try my comment or not? It works fine for `df.list <- list(dfone, dftwo, dfthree)`. Regarding loops, they are fine as long as you are not growing objects within the loop. There is not much of a difference between compiled C loop or an R loop, the main question is what are you doing while looping. See [this](http://stackoverflow.com/questions/28983292/is-the-apply-family-really-not-vectorized) for some discussion. – David Arenburg Oct 19 '15 at 19:18
  • Hi David. I havent tried your code yet, I will do so in the morning and report back when I do. I think your solution probably will do the correct job :) Thanks – Henry Oct 19 '15 at 19:19
  • 1
    @Herokas solution is a bit better – David Arenburg Oct 19 '15 at 19:20

1 Answers1

2

Edit after the question turned out to be something else. Does this answer your question?

dfs <- list(dfone, dftwo, dfthree)

#oneliner
res <- rowMeans(sapply(dfs,function(x){
  return(x[,"val"])
}))

#in steps

#step one: extract wanted column from all data
#this returns a matrix with one val-column for each df in the list
step1 <- sapply(dfs,function(x){
  return(x[,"val"])
})

#step two: calculate the rowmeans. this is self-explanatory
step2 <- rowMeans(step1)


#or an even shorter oneliner with thanks to@davidarenburg:

rowMeans(sapply(dfs, `[[`, "value"))
Heroka
  • 12,889
  • 1
  • 28
  • 38
  • Hi. I'm not trying to do `mean(dfone$x[1:n])`, but rather, `mean(df[1-5]$x[1])` if you see what I mean :) – Henry Oct 19 '15 at 16:04
  • It would help if you'd added that as your expected output, as nobody interpreted it as such. Can you update your question with relevant sample data and expected output based on that? – Heroka Oct 19 '15 at 16:26
  • Thanks. It certainly looks to do the job - can you explain what it is doing to me? Thanks – Henry Oct 19 '15 at 19:24
  • You could simplify to just ```rowMeans(sapply(dfs, `[[`, "value"))``` – David Arenburg Oct 19 '15 at 19:29
  • I still dont understand how `sapply(dfs,function(x){return(x[,'val'])})` calculates a mean. I appreciate that `step1` is a column with all the values in each dataframe that is of relevance. – Henry Oct 19 '15 at 19:30
  • 1
    It doesn't. It just gets the values. As you can see, step2 has a function `rowMeans`. Now what would that do? I'm sorry, but this is very basic R. – Heroka Oct 19 '15 at 19:32
  • Gah, I'm sorry, I was mis-reading `res<-rowMeans()` as a function definition for some reason. Apologies, stupidity. Tired after a long day + have never used `rowMeans` before. Long day :) – Henry Oct 19 '15 at 19:36