- Are the files in question in Cloud Storage?
- Are the files in question on a local or network drive?
In your description, you indicate "What I need is to compress several files into a single gzip". It isn't clear to me that a cluster of computers is needed for this. It sounds more like you just want to use tar along with gzip.
The tar utility will create an archive file it can compress it as well. For example:
$ # Create a directory with a few input files
$ mkdir myfiles
$ echo "This is file1" > myfiles/file1.txt
$ echo "This is file2" > myfiles/file2.txt
$ # (C)reate a compressed archive
$ tar cvfz archive.tgz myfiles/*
a myfiles/file1.txt
a myfiles/file2.txt
$ # (V)erify the archive
$ tar tvfz archive.tgz
-rw-r--r-- 0 myuser mygroup 14 Jul 20 15:19 myfiles/file1.txt
-rw-r--r-- 0 myuser mygroup 14 Jul 20 15:19 myfiles/file2.txt
To extract the contents use:
$ # E(x)tract the archive contents
$ tar xvfz archive.tgz
x myfiles/file1.txt
x myfiles/file2.txt
UPDATE:
In your updated problem description, you have indicated that you may have multiple orders processed simultaneously. If the frequency in which results need to be tar-ed is low, and providing the tar-ed results is not extremely time-sensitive, then you could likely do this with a single node.
However, as the scale of the problem ramps up, you might take a look at using the Pipelines API.
Rather than keeping a fixed cluster running, you could initiate a "pipeline" (in this case a single task) when a customer's order completes.
A call to the Pipelines API would start a VM whose sole purpose is to download the customer's files, tar them up, and push the resulting tar file into Cloud Storage. The Pipelines API infrastructure does the copying from and to Cloud Storage for you. You would effectively just need to supply the tar command line.
There is an example that does something similar here:
https://github.com/googlegenomics/pipelines-api-examples/tree/master/compress
This example will download a list of files and compress each of them independently. It could be easily modified to tar the list of input files.
Take a look at the https://github.com/googlegenomics/pipelines-api-examples github repository for more information and examples.
-Matt