2

I need to uncompress a .gz file and store it in a variable, so I can use it later. So, the idea is that I generate *.fastq.gz files, and I need to uncompress them and keep just the *.fastq file. Then, I would like to store its name in a variable, so I can call the file for further processing.

Here, there is the code I am executing: input: $file.fastq.gz Where $file is the name of the file (it changes, as this code is inside a loop)

reads=$(gunzip $file.fastq)
echo $reads

Does anybody know what is wrong with this code? Why it does not produce any output and the program stays in that point? Thank you very much! ;)

user3379797
  • 23
  • 1
  • 3
  • In addition to what other folks are saying, you need more quotes. `echo $reads`, as opposed to `echo "$reads"`, will have some serious bugs (changing newlines to spaces, expanding wildcards, etc). – Charles Duffy Mar 04 '14 at 16:19

3 Answers3

5

If the input file is $file.fastq.gz, the resulting output file is just that file with the .gz extension removed.

gunzip "$file.fastq.gz" & gunzip_pid=$!
reads="$file.fastq"
# Do some more work that doesn't depend on the contents of $file.fastq
# ...
wait $gunzip_pid || { echo "Problem with gunzip"; exit; }
# Do something with the now-complete $file.fastq here

(Original answer to misinterpreted question, saved as a useful non-sequitor.)

You need to tell gunzip to write the uncompressed stream to standard output, rather than uncompressing the file in-place.

reads=$(gunzip -c "$file.fastq.gz") || { echo "Problem with gunzip; exit; }
echo "$reads"
chepner
  • 497,756
  • 71
  • 530
  • 681
  • @user3379797 Only one of the two answers posted here will address your need; please indicate which one (if either) is what you want. – chepner Mar 04 '14 at 16:25
  • hey, thank you! the second answer fits more accurately what I am looking for, so I will take that one. I tried it and it works, it just requires some computational time as the .gz file is quite big, but it is working right now. thanks again – user3379797 Mar 04 '14 at 16:59
  • Depending on how soon you need the results of the decompression, you can do it in the background, continue with the rest of your script, then `wait` on the `gunzip` when you actually need `$file.fastq`. – chepner Mar 04 '14 at 17:00
  • thanks for the advice, but I can not do it this way because I need the decompressed file to continue with the script, as the next step uses it. – user3379797 Mar 04 '14 at 17:10
  • Please check if this solution introduces a bogus value in $reads if the .gz if corrupt or badly formatted.. or for some reason the gunzip command fails. – Nishant Shrivastava Mar 04 '14 at 17:36
  • This works only if the filename is known ahead of time. – crash springfield Jul 26 '18 at 15:57
  • The OP has the filename ahead of time. Determining what the input is called seems like an entirely separate question. – chepner Jul 26 '18 at 16:16
0

1) reads=$(gunzip $file.fastq) <--- first you should be doing your gunzip on the .gz file

2) echo $reads - You cannot store the uncompressed file in the variable .. so you cannot expect that the variable reads would have the name of the uncompressed file.

You should rather be using

gunzip $file.fastq.gz
if [[ $? -eq 0 ]]
then 
    reads="$file.fastq"
fi

Or a shorter syntax as suggested by Charles

if gunzip $file.fastq.gz
then 
    reads="$file.fastq"
fi
  • 1
    Checking `$?` for zero-vs-nonzero is silly -- you could just do `if gunzip ...; then ...` and not use `$?` at all. Doing it as a separate step just makes it easier to introduce bugs by having log statements or other content change the value stored in `$?`. – Charles Duffy Mar 04 '14 at 16:21
  • If you are checking the value of $? just after the line where you have done the gunzip .. what can change the value of $? .. Can you be a little explicit about what log statements/other contents are coming in between? – Nishant Shrivastava Mar 04 '14 at 17:18
  • Also I am not sure if a comment was deleted here... but somebody questioned the need for the gunzip line.. i guess it has been made fairly clear that we need to uncompress the file.. this is what has been written : ' the idea is that I generate *.fastq.gz files, and I need to uncompress them and keep just the *.fastq file.' – Nishant Shrivastava Mar 04 '14 at 17:22
  • I'm making the point that this is fragile, not that it's currently broken as-written. I'm saying if that someone adds a `echo "finished decompressing file"` immediately after the `gunzip` that would break it... and for what? Using `if gunzip ...` avoids coupling those lines, meaning that logging or other logic could be added without side effects, and is smaller and easier to read. – Charles Duffy Mar 04 '14 at 17:23
  • It might be an alternate syntax that can be considered.. $? returns the status of the last executed command.. so if somebody introduces an echo in the middle.. then certainly it would introduce the bug... the currently accepted answer can have a bogus value in $reads if the .gz file is corrupt or badly formatted... the above solution , as written now, would not have any bogus values in $reads. I would need to check if 'if gunzip .. ' syntax works.. because i am not sure gunzip returns a value that would be interpreted as false if an error occurs.. – Nishant Shrivastava Mar 04 '14 at 17:34
  • If `if gunzip` wouldn't work, neither would checking `$?` -- they both use the same exit status. – Charles Duffy Mar 04 '14 at 17:39
  • I verified your syntax works.. i was under the impression that if might work on the output of the gunzip command rather than the exit code – Nishant Shrivastava Mar 04 '14 at 17:49
0

Use zcat:

 reads=$(zcat $file.fastq)
steffen
  • 16,138
  • 4
  • 42
  • 81