1

I am building a processing pipeline for genomic data for my master thesis and I am using Argo.

Basically, I have a fully functioning processing workflow implemented in Argo Workflows and now I am trying to create an EventSource for detecting when a folder is written by the sequencer (then the folder name should be passed to the workflow through a Sensor).

The first problem is that the sequencer takes some time to write all the data, thus I cannot start the workflow as soon as the base directory is created. Therefore, the idea is to wait for a specific file inside the new run folder to be created, then start the workflow.

For simulating this, I am coping an old run folder inside the watched directory. Now, I have implemented the following EventSource, which does not listen to the specific file mentioned before, but just to the run folder and it works, the event is detected.

apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: directory-event-source
  namespace: tesi-fabrici
spec:
  template: 
    container:
      volumeMounts:
        - mountPath: /test_dir
          name: test-dir
    volumes: 
      - name: test-dir
        nfs: 
          server: 10.128.2.231
          path: /tesi_fabrici
  file:
    directoryCreated:
      watchPathConfig:
        directory: "/test_dir/watched_dir/"
        path: "210818_M70903_0027_000000000-JVRB4"
        # pathRegexp: TODO with regex
      eventType: CREATE

Now, I simulated what was said before, by copying all the data except for that one file and lastly, copying that file. Following the script for doing this.

#!/bin/bash

inputDirName=$1
inputDirPath=$2
sampleSheet=$3
outputPath=$4

rsync -hr --progress "$inputDirPath$inputDirName" $outputPath --exclude $sampleSheet
rsync -hr --progress "$inputDirPath${inputDirName}/$sampleSheet" "$outputPath$inputDirName"

And I run it from a pod in the cluster (with the same nfs folder mounted) as below:

./copy_script.sh 210818_M70903_0027_000000000-JVRB4 /external_prod_dir/AREA/MiSeqDx/ SampleSheet.csv /external_test_dir/watched_dir/

The file in question is the SampleSheet.csv. Now I modified the EventSource as it follows in order to listen to the creation of the sample sheet:

...
...
file:
    directoryCreated:
      watchPathConfig:
        directory: "/test_dir/watched_dir/"
        path: "210818_M70903_0027_000000000-JVRB4/SampleSheet.csv"
        # pathRegexp: TODO with regex
      eventType: CREATE

The data gets copied correctly, but in this case, the EventSource is not detecting the creation of the SampleSheet.csv. By doing some testing, I noticed that the field path: expects a file or a folder, but the EventSource does not work when I use a path, like in my case. Solving this particular case could be easy, I change the EventSource as it follows

...
...
file:
    directoryCreated:
      watchPathConfig:
        directory: "/test_dir/watched_dir/210818_M70903_0027_000000000-JVRB4/"
        path: "SampleSheet.csv"
        # pathRegexp: TODO with regex
      eventType: CREATE

and the creation of the sample sheet gets caught, but there is going to be only what's written in path: and I would need also the run folder name.

But the problem is, in a real scenario, the run folder names change but follow the same pattern as the folder I am using here (210818_M70903_0027_000000000-JVRB4). Therefore my plan was to use a regex to capture [path_of_new_run_folder]/SampleSheet.csv, and I don't think I can use a regex in the directory: but only in pathRegexp:

I hope I was pretty clear in what my problem is and please let me know how can I solve this.

0 Answers0