I will explain what's my problem first, as It's important to understand what I want :-).
I'm working on a python-written pipeline that uses several external tools to perform several genomics data analysis. One of this tools works with very huge fastq files, which at the end are no more that plain text files.
Usually, this fastq files are gzipped, and as they're are plain text the compression ratio is very high. Most of data analysis tools can work with gzipped files, but we have a few ones that can't. So what we're doing is unzipp the files, work with them, and finaly re-compress.
As you may imagine, this process is:
- Slower
- High disk consuming
- Bandwidth consuming (if working in a NFS filesystem)
So I'm trying to figure out a way of "tricking" these tools to work directly with gzipped files without having to touch the source code of the tools.
I thought on using FIFO files, and I tried that, but doesn't work if the tool reads the file more than once, or if the tool seeks around the file.
So basically I have to questions:
Is there any way to map a file into memory so that you can do something like:
./tool mapped_file
(where mapped_file is not really a file, but a reference to a memory mapped file.Do you have any other suggestions about how can I achieve my target?
Thank you very much to everybody!