1

I'm trying to iterate over url entries in a file and use each file as an input for a crawler tool. It's result should be written to a file.

here is the gitlab-ci.yml file:

stages:
  - test
test:
  stage: test

  tags:
    - shell-docker
  script:
    - wget https://github.com/FaKeller/sireg/releases/download/v0.3.1/sireg-linux
    - chmod 775 sireg-linux
    - mkdir output
    - ls -alF
    - while read line; do
         echo $line;
         ./sireg-linux exec --loader-sitemap-sitemap \"$line\" >> ./output/${line##*/}_out.txt;
      done < sitemap-index
    - ls -alF output
  artifacts:
    paths:
      - output/*
    expire_in: 1 hrs

and here is the sitemap-index file (only one entry):

http://example.com/sitemap.xml

both files are in the same directory. I expect a file sitemap.xml_out.txt to be written into the output folder(also the same directory). I am pretty sure the ./sireg-linux script does not execute because it usually takes few minutes to complete (tested locally).

the output of the stage looks like this:

2020-04-02 18:22:21 (4,26 MB/s) - »sireg-linux« saved [62566347/62566347]

$ chmod 775 sireg-linux
$ mkdir output
$ ls -alF
total 61128
drwxrwxr-x  4 gitlab-runner gitlab-runner     4096 Apr  2 18:22 ./
drwxrwxr-x 10 gitlab-runner gitlab-runner     4096 Apr  2 15:46 ../
drwxrwxr-x  5 gitlab-runner gitlab-runner     4096 Apr  2 18:22 .git/
-rw-rw-r--  1 gitlab-runner gitlab-runner      512 Apr  2 18:22 .gitlab-ci.yml
drwxrwxr-x  2 gitlab-runner gitlab-runner     4096 Apr  2 18:22 output/
-rw-rw-r--  1 gitlab-runner gitlab-runner       30 Apr  2 15:46 README.md
-rwxrwxr-x  1 gitlab-runner gitlab-runner 62566347 Nov 11  2017 sireg-linux*
-rw-rw-r--  1 gitlab-runner gitlab-runner       55 Apr  2 18:08 sitemap-index
$ while read line; do echo $line; ./sireg-linux **exec** --loader-sitemap-sitemap \"$line\" >> 
./output/${line##*/}_out.txt; done < sitemap-index
$ ls -alF output
total 8
drwxrwxr-x 2 gitlab-runner gitlab-runner 4096 Apr  2 18:22 ./
drwxrwxr-x 4 gitlab-runner gitlab-runner 4096 Apr  2 18:22 ../
Uploading artifacts...
Runtime platform                                    arch=amd64 os=linux pid=23813 revision=1f513601 version=11.10.1
WARNING: output/*: no matching files               
ERROR: No files to upload                          
Job succeeded

update

tried to move all steps into a separate script but that did not work either.

update 2

forgot to add exec in the command:

./sireg-linux exec --loader-sitemap-sitemap \"$line\" >> 
./output/${line##*/}_out.txt;

unfortunately it didn't help.

what can I do to make it working?

mooor
  • 327
  • 4
  • 13

2 Answers2

2

You can of course painfully debug multi-line commands in YAML.

You can even use YAML multi-line strings:

But I would just wrap code into a shell script, store it in the same GitLab repo, and just call it in .gitlab-ci.yml.
This way you can run this script exactly the same way both locally and in CI, which is a best practice in Continuous Delivery.

    - ./script.sh
Ivan
  • 9,089
  • 4
  • 61
  • 74
2

Try changing ./sireg-linux --loader-sitemap-sitemap \"$line\" to ./sireg-linux exec --loader-sitemap-sitemap "$line". Hope this helps!

EDIT: Also, it looks like the script doesn't enter the while loop at all. Maybe the file sitemap-index is empty or it has only one line without a newline at the end?

EDIT 2: The back-slashes in the command line are wrong. corrected my answer

Markus Herzog
  • 374
  • 5
  • 9