2

I searched online but I didn't find anything that could answer my question.

I'm using a java tool in Ubuntu Linux, calling it with bash command; this tool has two paths for two different input files:

java -Xmx8G -jar picard.jar FastqToSam \
FASTQ=6484_snippet_1.fastq \ #first read file of pair
FASTQ2=6484_snippet_2.fastq \ #second read file of pair
[...]

What I'd like to do is for example, instead of specify the path of a single FASTQ, specify the path of two different files.

So instead of having cat file1 file2 > File and using File as input of FASTQ, I'd like that this operation would be executed on the fly and create the File on the fly, without saving it on the file system (that would be what happens with the command cat file1 file2 > File).

I hope that I've been clear in explaining my question, in case just ask me and I'll try to explain better.

Vzzarr
  • 4,600
  • 2
  • 43
  • 80
  • So you basically want to pass the arguments as the contents of those two `.fastq` files? – Inian Nov 10 '17 at 10:55
  • I didn't understand anything... – m0skit0 Nov 10 '17 at 11:02
  • Yes, to be most clear as possible, let's say that I'd like to specify 2 input file for `FASTQ` and 2 input files for `FASTQ2`. The java tool as you can see gets normally in input 2 file paths, but for each of these file path I'd like to specify 2 files (so in this case a total of 4 files to be clear). – Vzzarr Nov 10 '17 at 11:04

1 Answers1

4

Most well-written shell commands which accept a file name argument also usually accept a list of file name arguments. Like cat file or cat file1 file2 etc.

If the program you are trying to use doesn't support this, and cannot easily be fixed, perhaps your OS or shell makes /dev/stdin available as a pseudo-file.

cat file1 file2 | java -mumble -crash -burn FASTQ=/dev/stdin

Some shells also have process substitutions, which (typically) look to the calling program like a single file containing whatever the process substitution produces on standard output.

java -mumble -crash -burn FASTQ=<(cat file1 file2) FASTQ2=<(cat file3 file4)

If neither of these work, a simple shell script which uses temporary files and deletes them when it's done is a tried and true solution.

#!/bin/sh
: ${4?Need four file name arguments, will process them pairwise}
t=$(mktemp -d -t fastqtwoness.XXXXXXX) || exit
trap 'rm -rf $t' EXIT HUP INT TERM  # remove in case of failure or when done
cat "$1" "$2" >$t/1.fastq
cat "$3" "$4" >$t/2.fastq
exec java -mumble -crash -burn FASTQ=$t/1.fastq FASTQ2=$t/2.fastq
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 2
    `cat` is [useless](http://www.iki.fi/era/unix/award.html) when you use it on a *single* file. Its very *purpose* is to concatenate multiple files after each other. The single file example is indeed practically useless, but shows a general design pattern ... But thanks for the opportunity to preach it (-: – tripleee Nov 10 '17 at 11:16
  • I tried this [command](https://pastebin.com/Zef7uhgU) but I received this [exception](https://pastebin.com/Bb80d1Pb). Probably the command that you suggested would work, and the problem is in this particular `.jar`. But I can say that when I used the `File` produced from `cat file1 file2 > File` the command ran without problems P.S. the java tool accepts .gz files; [here](https://stackoverflow.com/questions/8005114/fast-concatenation-of-multiple-gzip-files) say it's possible to cat 2 .gz files (or it's a different case?) P.P.S. sorry for deleting, but I was still working on this comment – Vzzarr Nov 10 '17 at 11:45
  • Indeed, the question you link to suggests it should work. I tried this out locally before posting my (now-deleted) comment but I guess my experiment was somehow wrong because I can't reproduce it myself now. However, `gzip` is nervous about opening "files" which do not have a `.gz` suffix at least in the command-line version. On the command line, `gzip -d – tripleee Nov 10 '17 at 11:56
  • I even tried with [this](https://pastebin.com/RwwL7GFF) (Strack Trace included), but as you can see now the exception is just more readable, because of using decompressed files. I think that the problem is that the java tool accepts a file path in input and with cat it receives a """stream""" instead. I think that for the moment the best solution would be the script that you suggested, but I wait for other solutions in case someone else has a good idea ;) – Vzzarr Nov 10 '17 at 12:39
  • Looks to me like your file isn't in the expected format. I'm not familiar enough with FASTA/FASTQ to say off hand what the error message really indicates but it's pretty clear that your input violates the line 1 requirement https://en.wikipedia.org/wiki/FASTQ_format#Format – tripleee Nov 10 '17 at 12:57
  • to give you an idea, [here](https://pastebin.com/vBMBN0Xt) I report the first lines of the two original files; it's about genomic and .fastq is an extension for formatting genomes. As you can see there is an @ for each "sample", seen that the tool raises that exception it means that maybe with the cat some character gets lost or inverted order... I don't know. Anyway don't worry, I don't want to bother you more than I did ;) – Vzzarr Nov 10 '17 at 13:16
  • Your trace indicates that line 1 in the input file begins with TCTCGACGCCCCC so your paste data does not seem to correspond to the data which the Java program actually received (this sequence is not present in your newest pastebin). You can do `FASTQ=<(cat file1 file2 | tee /tmp/check)` to get a copy of the exact result of the `cat` in `/tmp/check` but `cat` does not modify its inputs in any way. But yeah, probably accept this answer and start a new question if you still have issues which are basically unrelated to the problem you were asking about. – tripleee Nov 10 '17 at 13:41
  • @Vzzarr what<(cat ..) produces as far as java is concerned is really a filename, something like /dev/fd/63. It is a filename argument and the file is a non-seekable pipe . – Johan Boulé Jan 23 '22 at 19:14