1

I need to apply following SQL code into R.

SELECT col1, col2, col3, col4, col5, COUNT(1) AS newcol, 
SUM(othercol) AS newcol2, SUM(othercol*othercol2) AS newcol3 FROM df;
GROUP BY col1, col2, col3, col4, col5;
WHERE 'some conditions'

I understand how SELECT, GROUP BY, COUNT(1), SUM() and AS() works individually but not as a whole like in the code above, mainly how COUNT(1) and SUM() are working.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Cdives
  • 21
  • 1
  • 2
  • Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Jun 16 '16 at 06:28

1 Answers1

5

As the OP didn't provide a reproducible example, the following sql syntax works (using sqldf)

library(sqldf)
sqldf("select col1, col2, COUNT(1) as newcol, 
       sum(othercol) as newcol2 
       from df 
       where col1 = 1 
       group by col1, col2")
#  col1 col2 newcol      newcol2
#1    1    a      2 -0.009295454
#2    1    b      2 -0.164004051

The above can also be done using R methods

library(data.table)
setDT(df)[col1==1, .(newcol=.N, newcol2 = sum(othercol)), .(col1, col2)]
#   col1 col2 newcol      newcol2
#1:    1    a      2 -0.009295454
#2:    1    b      2 -0.164004051

Or using dplyr

library(dplyr)
df %>%
    filter(col1 == 1) %>%
    group_by(col1, col2) %>%
    summarise(newcol = n(), newcol2 = sum(othercol))

data

set.seed(24)
df <- data.frame(col1 = rep(1:4, each = 4), col2 = rep(letters[1:2], 
  each = 2), othercol = rnorm(16), othercol2 = runif(16))
akrun
  • 874,273
  • 37
  • 540
  • 662