I am building a processing pipeline for genomic data for my master thesis and I am using Argo.
Basically, I have a fully functioning processing workflow implemented in Argo Workflows and now I am trying to create an EventSource for detecting when a folder is written by the sequencer (then the folder name should be passed to the workflow through a Sensor).
The first problem is that the sequencer takes some time to write all the data, thus I cannot start the workflow as soon as the base directory is created. Therefore, the idea is to wait for a specific file inside the new run folder to be created, then start the workflow.
For simulating this, I am coping an old run folder inside the watched directory. Now, I have implemented the following EventSource, which does not listen to the specific file mentioned before, but just to the run folder and it works, the event is detected.
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
name: directory-event-source
namespace: tesi-fabrici
spec:
template:
container:
volumeMounts:
- mountPath: /test_dir
name: test-dir
volumes:
- name: test-dir
nfs:
server: 10.128.2.231
path: /tesi_fabrici
file:
directoryCreated:
watchPathConfig:
directory: "/test_dir/watched_dir/"
path: "210818_M70903_0027_000000000-JVRB4"
# pathRegexp: TODO with regex
eventType: CREATE
Now, I simulated what was said before, by copying all the data except for that one file and lastly, copying that file. Following the script for doing this.
#!/bin/bash
inputDirName=$1
inputDirPath=$2
sampleSheet=$3
outputPath=$4
rsync -hr --progress "$inputDirPath$inputDirName" $outputPath --exclude $sampleSheet
rsync -hr --progress "$inputDirPath${inputDirName}/$sampleSheet" "$outputPath$inputDirName"
And I run it from a pod in the cluster (with the same nfs folder mounted) as below:
./copy_script.sh 210818_M70903_0027_000000000-JVRB4 /external_prod_dir/AREA/MiSeqDx/ SampleSheet.csv /external_test_dir/watched_dir/
The file in question is the SampleSheet.csv
. Now I modified the EventSource as it follows in order to listen to the creation of the sample sheet:
...
...
file:
directoryCreated:
watchPathConfig:
directory: "/test_dir/watched_dir/"
path: "210818_M70903_0027_000000000-JVRB4/SampleSheet.csv"
# pathRegexp: TODO with regex
eventType: CREATE
The data gets copied correctly, but in this case, the EventSource is not detecting the creation of the SampleSheet.csv
.
By doing some testing, I noticed that the field path:
expects a file or a folder, but the EventSource does not work when I use a path, like in my case.
Solving this particular case could be easy, I change the EventSource as it follows
...
...
file:
directoryCreated:
watchPathConfig:
directory: "/test_dir/watched_dir/210818_M70903_0027_000000000-JVRB4/"
path: "SampleSheet.csv"
# pathRegexp: TODO with regex
eventType: CREATE
and the creation of the sample sheet gets caught, but there is going to be only what's written in path:
and I would need also the run folder name.
But the problem is, in a real scenario, the run folder names change but follow the same pattern as the folder I am using here (210818_M70903_0027_000000000-JVRB4
). Therefore my plan was to use a regex to capture [path_of_new_run_folder]/SampleSheet.csv, and I don't think I can use a regex in the directory:
but only in pathRegexp:
I hope I was pretty clear in what my problem is and please let me know how can I solve this.