I'm relatively new to R and SQL and I'm stuck with a query, hoping someone can answer my question.
I'm using the RODBC package and I need to do a rather sophisticated query.
I have an R dataframe that looks like this :
donor_id <- c(1,2)
date <- c(as.Date('2011-01-01'), as.Date('2013-06-14'))
df <- cbind(donor_id, date)
And a big database that has donation dates for every of my donor_id.
I want to add a column to my data frame from the database. This column will be the number of donations that each of my donor_id has made before the date in my data frame.
So according to this post, there are 3 solutions :
- Query the whole database and work from there (takes forever)
- Save the data frame into the server and join them in a query (I don't have sufficient rights for this)
- Pass the conditions from the data frame, which is what I'm trying to do
I have drawn ideas from this post for the general MO and this post for the sum of rows.
The general idea is to merge my data frame and the query
merge(df,query, by='donor_id')
The query looks like this :
paste("select donor_id, SUM(CASE WHEN donation_date <= ",df$date,"
THEN collection_count ELSE 0 END) AS 'ncol' from fact_collection
GROUP BY donor_id"))
However, when I'm trying this, I get the following error message :
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
Which means that my merge cannot recognize the donor_id for some reason.