2

I've just cleaned up a data frame that I scraped from an excel spreadsheet by amongst other things, removing percentage signs from some of the numbers see, Removing Percentages from a Data Frame.

The data has twenty four rows representing the parameters and results from eight experiments done in triplicate. Eg, what one would get from,

DF1 <- data.frame(X = 1:24, Y = 2 * (1:24), Z = 3 * (1:24))

I want to find the mean of each of the triplicates (which, fortunately are in sequential order) and create a new data frame with eight rows and the same amount of columns.

I tried to do this using,

DF2 <- data.frame(replicate(3,sapply(DF1, mean)))

which gave me the mean of each column as rows three times. I wanted to get a dataframe that would give me,

data.frame(X = c(2,5,8,11,14,17,20,23), Y = c(4,10,16,22,28,34,40,23), Z = c(6,15,24,33,42,51,60,69))

which I worked out by hand; it's supposed to be the reduced result.

Thanks, ...

Any help would be gratefully recieved.

Community
  • 1
  • 1
DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29

2 Answers2

4

Nice task for codegolf!

aggregate(DF1, list(rep(1:8, each=3)), mean)[,-1]

to be more general, you should replace 8 with nrow(DF1).

... or, my favorite, using matrix multiplication:

t(t(DF1) %*% diag(8)[rep(1:8,each=3),]/3)
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • Thanks for both of your answers. Does the first one return a data.frame and the second one return a matrix? I ask because the way that R returns the results looks slightly different. – DarrenRhodes Jan 18 '13 at 14:10
  • @user1945827, exactly. You can typecast them using `as.matrix` or `as.data.frame`. – Tomas Jan 18 '13 at 14:13
  • in the first answer, when I change '8' with nrow(DF1) I get an error returned. Don't know why, thought you might like to know, though. – DarrenRhodes Jan 18 '13 at 14:28
  • @user1945827 - works for me. What error you get? Is the DF1 still the original `data.frame` as you defined it in your question? – Tomas Jan 18 '13 at 14:33
  • " Is the DF1 still the original...". Ah, no. The other pieces of code provided by yourself and others is more than sufficient. I'll regenerate the error and post back here tomorrow. – DarrenRhodes Jan 18 '13 at 16:46
1

This works:

foo <- matrix(unlist(by(data=DF1,INDICES=rep(1:8,each=3),FUN=colMeans)),
  nrow=8,byrow=TRUE)
colnames(foo) <- colnames(DF1)

Look at ?by.

Stephan Kolassa
  • 7,953
  • 2
  • 28
  • 48
  • Hi @Stephan, your code almost works. I've lost my column headings. I tried the script again using 'data.frame' instead of 'matrix' but this returned a mess. I'll stick to your script and use 'names' to put the headers back if nothing else turns up. Thanks, – DarrenRhodes Jan 18 '13 at 13:20
  • I edited the code to add the `colnames`. But @Tomas' solution is much prettier, anyway, so +1 to him. – Stephan Kolassa Jan 18 '13 at 13:27