0

I have 4 columns in a data frame

a <- data.frame(a=c(1,2,3,4), b=c(4,5,6,7), c=c(7,6,5,4), d=c(8,4,3,2))

I want to average first two columns and last two columns to get one data frame with two columns of same nrows with average of first two columns and last two columns

expected output:

5 15
7 10
9 8
11 6
user1631306
  • 4,350
  • 8
  • 39
  • 74
  • possible duplicate here http://stackoverflow.com/questions/5559467/how-to-merge-two-columns-in-r-with-a-specific-symbol take a look it might do what you need – MCP_infiltrator Feb 10 '14 at 14:31

1 Answers1

1

To reproduce your output (which is sum, not mean):

library(plyr)
ddply(a, .(), summarise, first=a+b, second=c+d)[,-1]

It produces:

  first second
1     5     15
2     7     10
3     9      8
4    11      6

To make data.frame with averages:

ddply(a, .(), summarise, first=(a+b)/2, second=(c+d)/2)[,-1]

Output is:

  first second
1   2.5    7.5
2   3.5    5.0
3   4.5    4.0
4   5.5    3.0

If you don't know columns' names code can be modified like this:

ddply(a, .(), summarise, first=a[,1]+a[,2], second=a[,3]+a[,4])[,-1]

Here you access columns by its order. Alternatively, you can simply run names(a) <- letters[1:4] prior to ddply().

ddply is very flexible function, you can specify grouping variables as second argument and get grouped results. But if the case is as simple as in the question you can call summarise directly:

summarise(a, first=a+b, second=c+d)                 # if you know columns' names
summarise(a, first=a[,1]+a[,2], second=a[,3]+a[,4]) # if you don't know columns' names
redmode
  • 4,821
  • 1
  • 25
  • 30
  • Thanks, that worked. but i dont have any columns names. I gave above as an example. All i know to add first two columns and last two columns. How would i pass that information? – user1631306 Feb 10 '14 at 14:40
  • Simply run `names(a) <- letters[1:4]` prior to `ddply` to assign names – redmode Feb 10 '14 at 14:42
  • @redmionds, summarise(a, first=a[,1]+a[,2], second=a[,3]+a[,4]), is throwing error -> "Error in a[, 1] incorrect number of dimensions" – user1631306 Feb 10 '14 at 15:22
  • @user1631306, I think it's because `data.frame` `a` has column named `a`. – redmode Feb 10 '14 at 19:37