4

I have of folder/directory of one of my colleagues full of sqlstatements. The folder is updated by him daily as well. I would like to document these sqlstatements for futures colleagues. However, I'm looking for a way to "automate" that process. I thought about to use crontab once a week and run a R-Markdown file which automatically update the existing R-Markdown file.

My approach is as follows:

path = "c:/SQL_files/"
out.file<-""
file.names <- dir(path, pattern =".sql") # here I changed `.txt` to `.sql`
for(i in 1:length(file.names)){
file <- read.csv2.sql(file.names[i],header=TRUE, sep=";",    stringsAsFactors=FALSE)
  out.file <- rbind(out.file, file)
}


# That second approach comes very close, but just generates a `.txt` for the first
#`.sql` file in the directory with the error:

   Error in match.names(clabs, names(xi)) : 
   names do not match previous names 

where the files are:

 [1] "c:/SQL_files/first.sql"                                            
 [2] "c:/SQL_files/second.sql"     

 path = "c:/SQL_files/"
 out.file<-""
files <- list.files(path=path, pattern="*.sql", full.names=T, recursive=FALSE)
for(i in 1:length(files)){
  file <- read.table(files[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
  out.file <- rbind(out.file, file)
}

The loop which extracts the content of the .sql doesnt seem to capture the content at all(in the first example) or captures just the content of the first file in the directory (second example). So my question. Is there a way to extract content from a SQL Text File (.sql)? Which may result in the .txt/.Rmd as follows: (but doesnt have to):

output of the first loop: my_sql_statement.sql

output of the second loop: Select * From Data

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Mamba
  • 1,183
  • 2
  • 13
  • 33

2 Answers2

5

This RMD file generates a markdown/HTML document listing some meta data and the content of all files specified:

---
title: "Collection of SQL files"
author: "SQLCollectR"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output: 
  html_document: 
    keep_md: yes
---

```{r setup, echo = FALSE}
library(knitr)
path <- "files/"
extension <- "sql"
```

This document contains the code from all files with extension ``r extension`` in ``r paste0(getwd(), "/", path)``.

```{r, results = "asis", echo = FALSE}
fileNames <- list.files(path, pattern = sprintf(".*%s$", extension))
fileInfos <- file.info(paste0(path, fileNames))

for (fileName in fileNames) {
  filePath <- paste0(path, fileName)
  cat(sprintf("## File `%s` \n\n### Meta data \n\n", fileName))
  cat(sprintf(
    "| size (KB) | mode | modified |\n|---|---|---|\n %s | %s | %s\n\n", 
    round(fileInfos[filePath, "size"]/1024, 2), 
    fileInfos[filePath, "mode"],
    fileInfos[filePath, "mtime"]))
  cat(sprintf("### Content\n\n```\n%s\n```\n\n", paste(readLines(filePath), collapse = "\n")))
}

```

All the work is done in the for loop which iterates over all files in path whose names end in extension. For each file, a table with "meta data" is printed, followed by the actual file content. The meta data is retrieved using file.info and consists of file size, mode and last modified timestamp.

The cat(sprintf(... constructs containing markdown make the code look complicated, but it fact it is fairly simple.

Sample output

Using SQL files with SQL statements from this answer, the RMD file above generates the following output (using HTML as output format):

Sample output

Community
  • 1
  • 1
CL.
  • 14,577
  • 5
  • 46
  • 73
1

In order to read content from a text file that does not represent a delimited table, you might need to use readLines instead of read.table. The R way of doing this would use lapply:

files <- list.files(path=path, pattern="*.sql", full.names=T, recursive=FALSE)
out.files <- lapply(files,readLines)

This will give you a list containing a vector of characters (each element is one line of the file).

EDIT:

To answer the rest of your question, this sort of data could be converted into a single text file using writeLines.

names(out.files)<-files
printer = file("out.sql","w");
lapply(files,function (x) 
{
   writeLines(x,printer);
   writeLines(out.files[[x]],printer);
})
close(printer)

I would only do all this if you are doing some other manipulation in R, otherwise there are easier ways to append a bunch of files into one file.

Craig
  • 4,492
  • 2
  • 19
  • 22
  • Hi Thank you. But is there a way to separate/include the file name itself into the produced .txt. Without that its just .txt with just .sql statements – Mamba Jan 29 '16 at 15:46
  • If the names have not been attached to out.files, you can use names(out.files)<-files. Is this sufficient for you to create your output file? – Craig Jan 29 '16 at 16:04
  • The output looks very unstructured. I'm using `write.list {erer}`. – Mamba Jan 29 '16 at 16:19
  • @HiThere Perhaps you should try `writeLines` as in the answer instead of `write.list {erer}`. – Gregor Thomas Jan 29 '16 at 19:48