6

A few posters have asked similar questions on here and these have taken me 80% of the way toward reading text files with sql queries in them into R to use as input to RODBC:

Import multiline SQL query to single string

RODBC Temporary Table Issue when connecting to MS SQL Server

However, my sql files have quite a few comments in them (as --comment on this and that). My question is, how would one go about either stripping comment lines from query on import, or making sure that the resulting string keeps line breaks, thus not appending actual queries to comments?

For example, query6.sql:

--query 6
select a6.column1, 
    a6.column2,
    count(a6.column3) as counts
--count the number of occurences in table 1 
from data.table a6
group by a6.column1

becomes:

sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' '))
sqlStr 
"--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1"

when read into R.

Community
  • 1
  • 1
carnust
  • 611
  • 1
  • 8
  • 10

6 Answers6

2

Are you sure you can't just use it as is? This works despite taking up multiple lines and having a comment:

> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

This works too:

> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • The first solution above doesnt seem to work for queries built with many comments. The second solution (adding line breaks) does seem to do the trick. I have also found that for comments that are not on separate lines, replacing '--' with '\n--' works. – carnust Dec 17 '13 at 15:26
2

I had trouble with the other answer, so I modified Roman's and made a little function. This has worked for all my test cases, including multiple comments, single-line and partial-line comments.

read.sql <- function(filename, silent = TRUE) {
    q <- readLines(filename, warn = !silent)
    q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
    q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
    q <- paste(q, collapse = " ")
    return(q)
}
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
2

Summary

Function clean_query:

  • Removes all mixed comments
  • Creates single string output
  • Takes a SQL path or text string
  • Is simple

Function

require(tidyverse)

# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){


  # if sql path, read, otherwise assume text input
  if( str_detect(text_or_path, "(?i)\\.sql$") ){

    text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")

  }


  # echo original query to the console 
  #  (unnecessary, but helpful for status if passing sequential queries to a db)
  cat("\nThe query you're processing is: \n", text_or_path, "\n\n")


  # return
  text_or_path %>% 
    # remove all demarked /*  */ sql comments 
    gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>% 
    # remove all demarked -- comments 
    gsub(pattern = '--[^\r\n]*', replacement = ' ') %>% 
    # remove everything after the query-end semicolon 
    gsub(pattern = ';.*', replacement = ' ') %>% 
    #remove any line break, tab, etc.
    gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%  
    # remove extra whitespace 
    gsub(pattern = ' +', replacement = ' ') 

}

You could attach regexps together if you want incomprehensibly long expressions, but I recommend readable code.



Output for "query6.sql"

[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "



Additional Text Input Example

query <- "

    /* this query has 
    intentionally messy 
    comments
    */

    Select 
       COL_A -- with a comment here
      ,COL_B
      ,COL_C
    FROM 
      -- and some helpful comment here
      Database.Datatable
    ;
    -- or wherever

    /* and some more comments here */

"

Call function:

clean_query(query)

Output:

[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "



If you want to test reading from a .sql file:

temp_path <- path.expand("~/query.sql")

cat(query, file = temp_path)

clean_query(temp_path)

file.remove(temp_path)
Tori Oblad
  • 325
  • 3
  • 5
1

Something like this?

> cat("--query 6
+ select a6.column1, 
+ a6.column2,
+ count(a6.column3) as counts
+ --count the number of occurences in table 1 
+ from data.table a6
+ group by a6.column1", file = "query6.sql")
> 
> my.q <- readLines("query6.sql")
Warning message:
In readLines("query6.sql") : incomplete final line found on 'query6.sql'
> my.q
[1] "--query 6"                                    "select a6.column1, "                         
[3] "a6.column2,"                                  "count(a6.column3) as counts"                 
[5] "--count the number of occurences in table 1 " "from data.table a6"                          
[7] "group by a6.column1"                         
> find.com <- grepl("--", my.q)
> 
> my.q <- my.q[!find.com]
> paste(my.q, collapse = " ")
[1] "select a6.column1,  a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1"
> 
> unlink("query6.sql")
> rm(list = ls())
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • This works perfectly to remove the comments. Is there a way to specify line breaks instead of the white space when collapsing with paste() function? – carnust Dec 17 '13 at 13:42
  • @carnust is the line break purely for visual aid, or are you trying to achieve something else? – Roman Luštrik Dec 17 '13 at 14:23
  • I am thinking that the line break is interpreted as such by the SQL server and therefore that any words between the '--' marker and the line break are taken as comments? – carnust Dec 17 '13 at 15:31
0

had to solve a similar problem lately using another language and still find R to be easier to implement

readSQLFile <- function(fname, retainNewLines=FALSE) {
    lines <- readLines(fname)

    #remove -- type comments
    lines <- vapply(lines, function(x) {
        #handle /* -- */ type comments
        if (grepl("/\\*(.*)--", x)) 
            return(x)
        strsplit(x,"--")[[1]][1]
    }, character(1))

    #remove /* */ type comments
    sqlstr <- paste(lines, collapse=ifelse(retainNewLines, "&&&&&&&&&&" , " "))
    sqlstr <- gsub("/\\*(.|\n)*?\\*/","",sqlstr)
    if (retainNewLines) {
        sqlstr <- strsplit(sqlstr, "&&&&&&&&&&")[[1]]
        sqlstr <- sqlstr[sqlstr!=""]
    }
    sqlstr
} #readSQLFile


#example
fname <- tempfile("sql",fileext=".sql")
cat("--query 6
select a6.column1, --trailing comments
a6.column2, ---test triple -
count(a6.column3) as counts, --/* funny comment */
a6.column3 - a6.column4 ---test single - 
/*count the number of occurences in table 1; 
test another comment style
*/
from data.table a6 /* --1st weirdo comment */
/* --2nd weirdo comment */
group by a6.column1\n", file=fname)

#remove new lines
readSQLFile(fname)

#retain new lines
readSQLFile(fname, TRUE)

unlink(fname)
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
0

It's possible to use readChar() instead of readLines(). I had an ongoing issue with mixed commenting (-- or /* */) and this has always worked well for me.

sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)
Rob
  • 834
  • 1
  • 10
  • 15