2

I am writing a bash script that uses an argument from the command line input to pass into a python script, the result is using python's csv.writer module to produce a .csv file. I then have written an R script which accepts a .csv file on it's own, but I now want to pipe the csv file directly from my python script into my r script.

Here is my bash script:

#!/bin/bash

python protparams.py $1 | Rscript frequency.r

and my python script:

from Bio import SeqIO
from Bio.SeqUtils import ProtParam
from Bio.SeqUtils import ProtParamData
import sys
import csv

handle = open(sys.argv[1])
with open('test.csv', 'w') as fp: 
    writer = csv.writer(fp, delimiter=',')
    for record in SeqIO.parse(handle, "fasta"): 
            seq = str(record.seq)
            X = ProtParam.ProteinAnalysis(seq)
            data = [seq,X.get_amino_acids_percent(),X.aromaticity(),X.gravy(),X.isoelectric_point(),X.secondary_structure_fraction(),X.molecular_weight(),X.instability_index()]
            writer.writerow(data)

which all works fine up to here my python script generates the csv file when called via my bash script. great! but when I pipe it into the following R script as in my bash file I get this error:

Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") : cannot open file 'NA': No such file or directory
Execution halted

Here is my R script for reference:

args <- commandArgs(trailingOnly = TRUE)
dat <- read.csv(args[1], header=TRUE)
write.csv(dat, file = "out2.csv")

(at the moment my r script is simply testing to see if it can output the .csv file).

This message normally occurs when the file doesn't exist, however in this case I think it is appearing because the argument in my r script is expecting a file passed as a command line argument - which just isn't getting picked up in the current way I have written my bash script. Am I wrong in thinking that piping the output of my python program is the same as using the output as a command line argument for my r script?

Thanks very much.

brucezepplin
  • 9,202
  • 26
  • 76
  • 129
  • 3
    You might find [**this answer**](http://stackoverflow.com/a/9370949/1478381) useful... – Simon O'Hanlon May 27 '14 at 21:46
  • Untested suggetion. If sys.argv[1] is not set, use `fp=sys.stdout`. Otherwise, use `fp=open('test.csv', 'w')` Then invoke the python script without an argument. – R Sahu May 27 '14 at 21:49
  • 1
    See http://stackoverflow.com/a/15785789/1201032 for how to write R scripts that can read their inputs from a file or a stream. – flodel May 27 '14 at 23:03

1 Answers1

2

Yes, a little change in frequency.R (reading from stdin) allows to duplicate a .csv:

l@np350v5c:~$ cat foo.sh 
cat gbr_Country_en_csv_v2.csv | Rscript frequency.R

l@np350v5c:~$ cat frequency.R 
f <- file("stdin")
dat <- read.csv(f, header=TRUE)
write.csv(dat, file = "out2.csv")
Luca Braglia
  • 3,133
  • 1
  • 16
  • 21