8

I want to combine awk and R language. The thing is that I have a set of *.txt files in a specified directory and that I don't know the length of the header from the files. In some cases I have to skip 25 lines while in others I have to skip 27 and etc. So I want to type some awk commands to get the number of lines to skip. Once I have this value, I can begin processing the data with R.

Furthermore, in the R file I combine R an bash so my code looks like this :

!/usr/bin/env Rscript
...
argv <- commandArgs(T)
**error checking...**
import_file <- argv[1]
export_file <- argv[2]
**# your function call**
format_windpro(import_file, export_file)

Where and how can i type my awk command. Thanks!

I tried to do what you told me about awk commands and I still get an error. The program doesn't recognize my command and so I can not enter the number of lines to skip to my function. Here is my code:

**nline <- paste('$(grep -n 'm/s' import_file |awk -F":" '{print $1}')')

nline <- scan(pipe(nline),quiet=T)**

I look for the pattern m/s in the first column in order to know where I have my header text. I use R under w7.

JPV
  • 1,079
  • 1
  • 18
  • 44

2 Answers2

12

Besides Vincent's hint of using system("awk ...", intern=TRUE), you can also use the pipe() function that is part of the usual text connections:

R> sizes <- read.table(pipe("ls -l /tmp | awk '!/^total/ {print $5}'"))
R> summary(sizes)
       V1          
 Min.   :       0  
 1st Qu.:     482  
 Median :    4096  
 Mean   :   98746  
 3rd Qu.:   13952  
 Max.   :27662342  
R> 

Here I am piping a command into awk and then read all the output from awk, that could also be a single line:

R> cmd <- "ls -l /tmp | awk '!/^total/ {sum = sum + $5} END {print sum}'"
R> totalsize <- scan(pipe(cmd), quiet=TRUE)
R> totalsize
[1] 116027050
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
6

You can use system to run an external program from R.

system("gawk --version", intern=TRUE)
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • But where do I use it, in my script or in my SAmpleStatus.r file. And how does it runs? Can you be a little bit more precisely? Thanks – JPV Mar 02 '12 at 12:05
  • You can use it in your R script to call your awk script (you need to replace `--version`, of course); it returns the output of your awk script, as a vector of strings. – Vincent Zoonekynd Mar 02 '12 at 12:17