0

I am trying to figure out how to join/append a column of values from one column to another in R based upon criteria. I'm sure this is obvious but I'm having trouble trying to teach myself how to do this.

If someone could show me both how to do this with sqldf and 'normal'/ easiest/best way in normal R code that would be great!!!!

library(sqldf)
library(plyr)
x=mtcars
groups=ddply(x, .(gear, cyl), summarise, avgMPG=mean(mpg),avgHP=mean(mpg),  .drop=FALSE)

#help below --- I get an error

mtcars$avg_hp=sqldf("select avgHP from groups where mtcars$gear=groups$gear and    mtcars$cyl=groups$cyl")
mtcars$avg_mpg=sqldf("select avgMPG from groups where mtcars$gear=groups$gear and    mtcars$cyl=groups$cyl")
Jaap
  • 81,064
  • 34
  • 182
  • 193
runningbirds
  • 6,235
  • 13
  • 55
  • 94
  • can you please provide a example of what you are trying to do. perhaps you are looking for `merge`? – Alex Aug 27 '14 at 06:42

1 Answers1

1

The correct syntax for your SQL join is this:

sqldf("SELECT * FROM groups 
       INNER JOIN mtcars 
       ON mtcars.gear = groups.gear AND mtcars.cyl=groups.cyl")

Resulting in:

   gear cyl avgMPG  avgHP  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1     3   4 21.500 21.500 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
2     3   6 19.750 19.750 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
3     3   6 19.750 19.750 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
4     3   8 15.050 15.050 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
....

And the equivalent in R is

merge(mtcars, groups, all.x=TRUE)

Resulting in

   cyl gear  mpg  disp  hp drat    wt  qsec vs am carb avgMPG  avgHP
1    4    3 21.5 120.1  97 3.70 2.465 20.01  1  0    1 21.500 21.500
2    4    4 24.4 146.7  62 3.69 3.190 20.00  1  0    2 26.925 26.925
3    4    4 22.8 140.8  95 3.92 3.150 22.90  1  0    2 26.925 26.925
4    4    4 22.8 108.0  93 3.85 2.320 18.61  1  1    1 26.925 26.925
5    4    4 33.9  71.1  65 4.22 1.835 19.90  1  1    1 26.925 26.925
....

You can also use the function ave() to do the grouping calculation in a single step:

within(mtcars, {
  avgMPG <- ave(mpg, gear, cyl, FUN = mean)
  avgHP  <- ave(mpg, gear, cyl, FUN = mean)
})


                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  avgHP avgMPG
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 19.750 19.750
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 19.750 19.750
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 26.925 26.925
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 19.750 19.750
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 15.050 15.050
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 19.750 19.750
.....
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
Andrie
  • 176,377
  • 47
  • 447
  • 496