averaging every 16 columns in r

Question

Possible Duplicate:
apply a function over groups of columns

I have a data.frame with 30 rows and many columns (1000+), but I need to average every 16 columns together. For example, the data frame will look like this (I truncate it to make it easier..):

Col1            Col2            Col3            Col4........

4.176           4.505           4.048           4.489
6.167           6.184           6.359           6.444
5.829           5.739           5.961           5.764
.
.
.

Therefore, I cannot aggregate (I do not have a list) and I tried:

a <- data.frame(rowMeans(my.df[,1:length(my.df)]) )

which gives me the average of the all 1000+ coumns, But is there any way to say I want to do that every 16 columns until the end? (they are multiple of 16 the total number of columns).

A secondary, less important point but would be useful to solve this as well. The col names are in the following structure:

XXYY4ZZZ.txt

Once averaged the columns all I need is a new col name with only XXYY as the rest will be averaged out. I know I could use gsub but is there an optimal way to do the averaging and this operation in one go?

I am still relatively new to R and therefore I am not sure where and how to find the answer.

agreed @Joran, the answers to my question that you link to should be readily adaptable to answer this question. — Ben, May 22 '12 at 15:22

score 5 · Answer 1 · edited May 23 '17 at 10:29

Here is an example adapted from @ben's question and @TylerRinker's answer from apply a function over groups of columns . It should be able to apply any function over a matrix or data frame by intervals of columns.

# Create sample data for reproducible example
n <- 1000
set.seed(1234)
x <- matrix(runif(30 * n), ncol = n)

# Function to apply 'fun' to object 'x' over every 'by' columns
# Alternatively, 'by' may be a vector of groups
byapply <- function(x, by, fun, ...)
{
    # Create index list
    if (length(by) == 1)
    {
        nc <- ncol(x)
        split.index <- rep(1:ceiling(nc / by), each = by, length.out = nc)
    } else # 'by' is a vector of groups
    {
        nc <- length(by)
        split.index <- by
    }
    index.list <- split(seq(from = 1, to = nc), split.index)

    # Pass index list to fun using sapply() and return object
    sapply(index.list, function(i)
            {
                do.call(fun, list(x[, i], ...))
            })
}

# Run function
y <- byapply(x, 16, rowMeans)

# Test to make sure it returns expected result
y.test <- rowMeans(x[, 17:32])
all.equal(y[, 2], y.test)
# TRUE

You can do other odd things with it. For example, if you needed to know the total sum of every 10 columns, being sure to remove NAs if present:

y.sums <- byapply(x, 10, sum, na.rm = T)
y.sums[1]
# 146.7756 
sum(x[, 1:10], na.rm = T)
# 146.7756

Or find the standard deviations:

byapply(x, 10, apply, 1, sd)

Update

by can also be specified as a vector of groups:

byapply(x, rep(1:10, each = 10), rowMeans)

Hi jthetzel, thank you very much! this worked just fine. I thought that due to my basic R levels it would have take longer but actually it went very smooth...thank you again! — david, May 22 '12 at 17:20
Thank you very much for the update!! these are all useful comments!! — david, May 23 '12 at 12:47

score 0 · Answer 2 · edited May 22 '12 at 18:54

0

This works for me on a much smaller data frame:

rowMeans(my.df[,seq(1,length(my.df),by=16)])

edited May 22 '12 at 18:54

David LeBauer

31,011
31
115
189

answered May 22 '12 at 15:31

bob.sacamento

6,283
10
56
115

1

you're taking the mean of only the columns in that sequence (1, 17, 33, etc.) rather than the mean of the group of columns 1:16, 17:32 etc. – Justin May 22 '12 at 15:57
Hi Justin, I am taking the mean of the columns 1:16 then from 17 to 32 and so on. Bob and Ben and Joran, thank you for the answers! I will try different things and see how it goes. – david May 22 '12 at 16:03
Sorry. Misread the question. – bob.sacamento May 22 '12 at 16:32
That is ok Bob, thank you for reading, I am sure I will be able to use your code in other situations. – david May 22 '12 at 17:20

averaging every 16 columns in r

2 Answers2

Linked