How to extract the content of SQL-Files using R?

Question

I have of folder/directory of one of my colleagues full of sqlstatements. The folder is updated by him daily as well. I would like to document these sqlstatements for futures colleagues. However, I'm looking for a way to "automate" that process. I thought about to use crontab once a week and run a R-Markdown file which automatically update the existing R-Markdown file.

My approach is as follows:

path = "c:/SQL_files/"
out.file<-""
file.names <- dir(path, pattern =".sql") # here I changed `.txt` to `.sql`
for(i in 1:length(file.names)){
file <- read.csv2.sql(file.names[i],header=TRUE, sep=";",    stringsAsFactors=FALSE)
  out.file <- rbind(out.file, file)
}


# That second approach comes very close, but just generates a `.txt` for the first
#`.sql` file in the directory with the error:

   Error in match.names(clabs, names(xi)) : 
   names do not match previous names

where the files are:

 [1] "c:/SQL_files/first.sql"                                            
 [2] "c:/SQL_files/second.sql"     

 path = "c:/SQL_files/"
 out.file<-""
files <- list.files(path=path, pattern="*.sql", full.names=T, recursive=FALSE)
for(i in 1:length(files)){
  file <- read.table(files[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
  out.file <- rbind(out.file, file)
}

The loop which extracts the content of the .sql doesnt seem to capture the content at all(in the first example) or captures just the content of the first file in the directory (second example). So my question. Is there a way to extract content from a SQL Text File (.sql)? Which may result in the .txt/.Rmd as follows: (but doesnt have to):

output of the first loop: my_sql_statement.sql

output of the second loop: Select * From Data

So you just want to take all the queries from various files and put thein in one .txt? — Jakub Kania, Jan 29 '16 at 13:42
Hi @JakubKania Yes, exactly.Preferably .Rmd,but that just an ideal — Mamba, Jan 29 '16 at 13:45
And you are sure you don't want to use version control system instead? — Jakub Kania, Jan 29 '16 at 13:55
Yes that would be great to have the upper command running. But thank you for your suggestion! — Mamba, Jan 29 '16 at 14:09

score 5 · Accepted Answer · edited May 23 '17 at 11:52

This RMD file generates a markdown/HTML document listing some meta data and the content of all files specified:

---
title: "Collection of SQL files"
author: "SQLCollectR"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output: 
  html_document: 
    keep_md: yes
---

```{r setup, echo = FALSE}
library(knitr)
path <- "files/"
extension <- "sql"
```

This document contains the code from all files with extension ``r extension`` in ``r paste0(getwd(), "/", path)``.

```{r, results = "asis", echo = FALSE}
fileNames <- list.files(path, pattern = sprintf(".*%s$", extension))
fileInfos <- file.info(paste0(path, fileNames))

for (fileName in fileNames) {
  filePath <- paste0(path, fileName)
  cat(sprintf("## File `%s` \n\n### Meta data \n\n", fileName))
  cat(sprintf(
    "| size (KB) | mode | modified |\n|---|---|---|\n %s | %s | %s\n\n", 
    round(fileInfos[filePath, "size"]/1024, 2), 
    fileInfos[filePath, "mode"],
    fileInfos[filePath, "mtime"]))
  cat(sprintf("### Content\n\n```\n%s\n```\n\n", paste(readLines(filePath), collapse = "\n")))
}

```

All the work is done in the for loop which iterates over all files in path whose names end in extension. For each file, a table with "meta data" is printed, followed by the actual file content. The meta data is retrieved using file.info and consists of file size, mode and last modified timestamp.

The cat(sprintf(... constructs containing markdown make the code look complicated, but it fact it is fairly simple.

Sample output

Using SQL files with SQL statements from this answer, the RMD file above generates the following output (using HTML as output format):

Thats amazing! Thank you so much! – Mamba Jan 29 '16 at 21:11 — Mamba, Jan 29 '16 at 21:11

Craig · Answer 2 · 2016-01-29T17:07:22.677

1

In order to read content from a text file that does not represent a delimited table, you might need to use readLines instead of read.table. The R way of doing this would use lapply:

files <- list.files(path=path, pattern="*.sql", full.names=T, recursive=FALSE)
out.files <- lapply(files,readLines)

This will give you a list containing a vector of characters (each element is one line of the file).

EDIT:

To answer the rest of your question, this sort of data could be converted into a single text file using writeLines.

names(out.files)<-files
printer = file("out.sql","w");
lapply(files,function (x) 
{
   writeLines(x,printer);
   writeLines(out.files[[x]],printer);
})
close(printer)

I would only do all this if you are doing some other manipulation in R, otherwise there are easier ways to append a bunch of files into one file.

edited Jan 29 '16 at 17:07

answered Jan 29 '16 at 15:32

Craig

4,492
2
19
22

Hi Thank you. But is there a way to separate/include the file name itself into the produced .txt. Without that its just .txt with just .sql statements – Mamba Jan 29 '16 at 15:46
If the names have not been attached to out.files, you can use names(out.files)<-files. Is this sufficient for you to create your output file? – Craig Jan 29 '16 at 16:04
The output looks very unstructured. I'm using `write.list {erer}`. – Mamba Jan 29 '16 at 16:19
@HiThere Perhaps you should try `writeLines` as in the answer instead of `write.list {erer}`. – Gregor Thomas Jan 29 '16 at 19:48

How to extract the content of SQL-Files using R?

2 Answers2

Sample output

Linked