0

I have written the below query to study some sample means for my analysis and this work fine and I do not require to mug up this code. However I need to comprehend the use of "",; sep="" along with limit statement in dbGetQuery.

df <- data.frame(Sample.Num = integer(),
                 Sample.Mean = integer(),
                 quant.01 = integer(),
                 quant.05 = integer(),
                 quant.10 = integer(),
                 quant.25 = integer(),
                 quant.50 = integer(),
                 quant.75 = integer(),
                 stringsAsFactors = FALSE)
df[1,] <- NA
for (i in 1:500){
  sdf <- dbGetQuery(con,"select col11,col23,col30
                    from schema.db.name
                    where col1 in ('value1')
                    and col2 in ('(v3) - value3')
                    and col3 in ('v0123 - value4')
                    order by random()*600000 limit 100")
  meansample <- mean(sdf$mileage,na.rm = TRUE)
  quant.01 <- quantile(sdf$mileage,na.rm = TRUE,probs = .01)
  quant.05 <- quantile(sdf$mileage,na.rm = TRUE,probs = .05)
  quant.10 <- quantile(sdf$mileage,na.rm = TRUE,probs = .10)
  quant.25 <- quantile(sdf$mileage,na.rm = TRUE,probs = .25)
  quant.50 <- quantile(sdf$mileage,na.rm = TRUE,probs = .50)
  quant.75 <- quantile(sdf$mileage,na.rm = TRUE,probs = .75)
dbDisconnect(con)

Problem:

Col1,2 & 3 have character values. Now I need to check the same for all 7 values of col2 and 9 values of col3. Whenever I use any specific values of col1,2,3 I need to store them in a data frame to perform some operations before the loop ends and restarts with second value.

Hence this calls for studying: Add a dynamic value into RMySQL getQuery

I did study gsub and shQuote as well. I am unable to conceptualize the use of '", df[i], "';", sep = "" despite of several combinations along with limit statement. Had it been a local database I could have used dplyr package to run the loop. I understand seq_along() requires a numeric value and in order to pass through the loop which sets right 'correct value' inside in () query. I did try paste0 and paste arguments along with collapse="," arguements but did not help. I also tried lapply from below post but it does not work inside in () query.

How to do dbGetQuery for loop in R

(v3) - value3, (v1) - value1 etc are the actual elements in col. This is combination of code and description of the attribute col2, col3 etc.

Community
  • 1
  • 1
ARIMITRA MAITI
  • 303
  • 3
  • 4
  • 14
  • `col2 in ('(v3) - value3')` ... this makes no sense as a `WHERE IN` condition, and it certainly won't behave the way you think. – Tim Biegeleisen Sep 12 '16 at 13:27
  • @Tim Biegeleisen Apologies for not posting clearly. Is there any way you can explain me the use of '", df[i], "';", sep = "" while preparing in () statement to be used in where clause? – ARIMITRA MAITI Sep 12 '16 at 14:32
  • If you mean to compare `col2` against the text `(v3) - value` literally, then the condition is correct, but you don't need to use `WHERE IN` with a single value. Instead, you could have said `WHERE col2 = '(v3) - value`' – Tim Biegeleisen Sep 12 '16 at 14:34
  • Thank You. I know where and in clause in perspective of SQL. I was just looking for some help to comprehend the complicated post How to add a dynamic value into RMySQL getQuery It has a lot of useful concept but a dumb like me failed to get the knowledge behind it. Many Thanks – ARIMITRA MAITI Sep 12 '16 at 15:09

1 Answers1

0

Let’s assume that you have a query and want to substitute some range values in it.

sql <- "select col11,col23,col30
                    from schema.db.name
                    where col1 in (%s)
                    and col2 in (%s)
                    and col3 in (%s)
                    order by random()*600000 limit 100"

Here I will use sprintf to do the substitutions in the query, so I have left markers (%s) in the string. So let’s setup 3 ranges that we want to test for:

col1 <- 1:10
col2 <- c('a', 'f', 'z')
col3 <- c('name1', 'name2')

# create strings that are valid in SQL
col1_sql <- paste(col1, collapse = ',')
col2_sql <- paste0("'", col2, "'", collapse = ',')  # put quotes on strings
col3_sql <- paste0("'", col3, "'", collapse = ',')

# now substitute back in query
sql_new <- sprintf(sql,
                   col1_sql,  # strings we just constructed
                   col2_sql,
                   col3_sql
)

# print out the query
cat(sql_new)

Here is what the query will be:

select col11,col23,col30
    from schema.db.name
    where col1 in (1,2,3,4,5,6,7,8,9,10)
    and col2 in ('a','f','z')
    and col3 in ('name1','name2')
    order by random()*600000 limit 100