I am trying to use TypeScript to generate a batch processing job that processes a number of files from Google Cloud Storage and maps a set of workers to batches therin.
I am using the @google-cloud/storage
package to read files and generate a set of batches based on file ID i.e.
function getBatches(
storage:gcs.Storage,
bucketName:string,
batchSize:number,
maxBatches:number,
):void {
// Lists files in the bucket
storage.bucket(bucketName).getFiles().then(files => {
... omitted for brevity
batches = _.slice(batches, 0, maxBatches);
// how to return?
}).catch(console.error);
}
The problem here is that the wrapping class that I use to create the deployment strategy requires that the state be present on run completion i.e.
export class StorageWorkers ... {
this.jobs ...
constructor ... {
...
const batches = getBatches(...);
for (x of batches) {
this.jobs[x] = new worker.Deployment(x,...); // requires sync
};
// state should be set
};
};
The problem here is obviously, how could one make getBatches sync such that the output can be used in the subsequent state of the StorageWorkers class?
I have tried implementing the following in the getBatches fn however this tends to return empty on account of async execution.
getBatches(...) ... {
let outs = [];
...then(files => {
... omitted for brevity
batches = _.slice(batches, 0, maxBatches);
outs.push(batches);
}).catch(console.error);
return outs;
};
I also can't use a callback to set the state for the same reason.
How should one appropriately implement this when a wrapping class uses async code to derive state?