0

EDIT: I've pushed my code to a new branch just for this question

Link to branch

Link to shell script


I'm trying to create a bash file that loops over all directories (and subdirectories) of a predefined structure (i.e. I know for certain the directories are named in a specific manner).

An obvious solution for this was a case...esac statement using regular expression patterns. So off I go to StackOVerflow where I found this post explaining how to do exactly what I need, so I'm ready to create a test code and this is what I came up with:

test
|
|__ python
|
|__ python-
|
|__ python-a
|
|__ python-0
|
|__ python0-
|
|__ python2
|
|__ pythona
|
|__ test.sh

and run this:

# test.sh
#!/bin/bash
shopt -s extglob;
for dir in $(ls); do
        case $dir in
                python*(-)*([a-zA-Z0-9]))
                        echo "yes -> $dir"
                        ;;
                *)
                        echo "no -> $dir"
                        ;;
        esac
done
shopt -u extglob;

which gives me the following output:

yes -> python
yes -> python-
yes -> python-a
no -> python0-
yes -> python2
yes -> pythona
no -> test.sh

which seems to work fine.

I carried this method on to my actual code:

# actual_code.sh
cd $PROGRAMS_DIR
for language in $(ls -l --group-directories-first | tail -n $(ls -l | awk '{print $1}' | grep d | wc -l) | awk '{print $9}'); do
    cd $language
    echo "language -> $language"
    for algorithm in $(ls -l --group-directories-first | tail -n $(ls -l | awk '{print $1}' | grep d | wc -l) | awk '{print $9}'); do
        cd $algorithm
        echo "algo -> $algorithm"
        shopt -s extglob;
        case $language in
            rust*(-)*([0-9]))
                rustc "${algorithm}_run.rs" -o "${algorithm}_run"
                COMMAND="./${algorithm}_run"
                if [ $TEST -eq 1 ]; then
                    echo "> Running Rust tests for $algorithm"
                    rustc --test "${algorithm}_test.rs" -o "${algorithm}_test"
                    ./${algorithm}_test
                    if [ $(echo $?) -ne 0 ]; then
                        exit 1
                    fi
                fi
                ;;
            go*(-)*([0-9]))
                go build -o "${algorithm}_run" .
                COMMAND="./${algorithm}_run"
                if [ $TEST -eq 1 ]; then
                    echo "> Running Go tests for $algorithm"
                    go test
                    if [ $(echo $?) -ne 0 ]; then
                        exit 1
                    fi
                fi
                ;;
            java*(-)*([0-9]))
                javac -cp .:$JUNIT:$HAMCREST *.java
                COMMAND="java -cp .:${JUNIT}:${HAMCREST} ${algorithm}_run"
                if [ $TEST -eq 1 ]; then
                    echo "> Running Java tests for $algorithm"
                    java -cp .:${JUNIT}:${HAMCREST} ${algorithm}_test
                    if [ $(echo $?) -ne 0 ]; then
                        exit 1
                    fi
                fi
                ;;
            c*(-)*([0-9]))
                # TODO: Try to implement both the normal executable and the -O2 optimisation
                gcc -Wall -c "${algorithm}.c" "${algorithm}_run.c"
                gcc -o "${algorithm}_run" "${algorithm}.o" "${algorithm}_run.o"
                COMMAND="./${algorithm}_run"
                if [ $TEST -eq 1 ]; then
                    echo "> Running C tests for $algorithm"
                    gcc -Wall -c "${algorithm}.c" "${algorithm}_test.c" $UNITY
                    gcc -o "${algorithm}_test" "${algorithm}.o" "${algorithm}_test.o" "unity.o"
                    ./${algorithm}_test
                    if [ $(echo $?) -ne 0 ]; then
                        exit 1
                    fi
                fi
                ;;
            python*(-)*([0-9]))
                COMMAND="python ${algorithm}_run.py"
                if [ $TEST -eq 1 ]; then
                    echo "> Running Python tests for $algorithm"
                    pytest .
                    if [ $(echo $?) -ne 0 ]; then
                        exit 1
                    fi
                fi
                ;;
            haxe*(-)*([0-9]))
                COMMAND="haxe --main ${algorithm^}_Run.hx --interp"
                if [ $TEST -eq 1 ]; then
                    echo "> Running Haxe tests for $algorithm"
                    haxe --main "${algorithm^}_Test.hx" --library utest --interp -D UTEST_PRINT_TESTS
                    if [ $(echo $?) -ne 0 ]; then
                        exit 1
                    fi
                fi
                ;;
            *)
                echo "($language) has no compilation steps. Did you forget to update the benchmark script?"
                ;;
        esac
        shopt -u extglob;
        .
        .
        .
        some other random code
        .
        .
        .
    done
done

Executing this code now gives me this

./actual_code.sh: line 408: syntax error near unexpected token `('
./actual_code.sh: line 408: `                rust*(-)*([0-9]))'

I'm obviously missing something, but they look the same to me. Also, the echo part doesn't work. It goes straight to the error which is weird as this is an interpreted language.

Mario
  • 339
  • 4
  • 17
  • if you invoke your script like that : `./script.sh` , ensure to put a shebang at top of your file : `#!/bin/bash` – dirtyvader Mar 29 '22 at 11:08
  • Apologies, this is actually a shaved down version of the code, it has the shebang line and works fine if I just remove the `*(-)*([0-9])` from my case patterns. – Mario Mar 29 '22 at 11:10
  • 1
    Perhaps add "set -x" at the top of the script to enable more debugging output? (https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_02_03.html#sect_02_03_01) – wowbagger Mar 29 '22 at 11:12
  • Added that, it seems to just through my initial for-loop that captures CLI flags and then straight into the `unexpected token` error. This is my bash version `$ bash --version GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)` – Mario Mar 29 '22 at 11:16
  • refer to this [pastebin link](https://pastebin.com/Vk3Figat) for the output of `set -x` – Mario Mar 29 '22 at 11:22
  • your _filtering_ in the test-case can be reduced to `shopt -s extglob nullglob; for dir in python*(-)*([[:alnum:]])/; do ...` – Fravadona Mar 29 '22 at 11:23
  • `rust*(-)*([0-9]))` there is an additional closing bracket at the end, which doesn't seem like it gets opened anywhere. – mashuptwice Mar 29 '22 at 11:25
  • @mashuptwice that's the closing bracket that matches the syntax of a case statement (e.g. `case $1 in pattern1) do this ;; pattern2) do this ;; esac`) – Mario Mar 29 '22 at 11:26
  • @Fravadona I don't know how that'd work. As you can see from my code, it's not just `python*(-)*([0-9])` that needs matching, but a number of languages. Also, do you mind explaining the `nullglob` and `:alnum:` parts? – Mario Mar 29 '22 at 11:28
  • `[[:alnum:]]` would be the same as `[0-9a-zA-Z]`. The option `nullglob` means that if a glob doesn't match anything then it will be expanded to nothing instead of returning the glob itself. Check the results of `ls ./I_do_not_exist/*` with and without `nullglob` – Fravadona Mar 29 '22 at 13:19
  • Your code isn't 408 lines long so we can't tell which line the error message refers to, and it has elipses so it's not complete code so we couldn't even copy/paste it into http://shellcheck.net to get useful output. Please create and post a [mcve] that demonstrates your problem along with the error message generated by that minimal code. – Ed Morton Mar 29 '22 at 13:38
  • I see you use variations of `for x in $(ls...)` a lot - that one statement contains 2 different anti-patterns, see https://mywiki.wooledge.org/ParsingLs and https://mywiki.wooledge.org/BashFAQ/001. – Ed Morton Mar 29 '22 at 13:41
  • Before posting your [mcve], please make sure to run it through http://shellcheck.net as the [bash](https://stackoverflow.com/tags/bash/info) tag you used tells you to as there several issues in your current script it'd help you with. – Ed Morton Mar 29 '22 at 13:43
  • @EdMorton I've run my script through the shellcheck website you suggested. Indeed a lot of flags came up but things that were more or less safeguards, rather than errors. After applying the fixes suggested, a ton of other things went wrong with reading some variables and division by 0 etc. As for the `for x in $(ls ...)`, I used to have some lists with hardcoded directory and algorithm names which I wanted to avoid as it's tedious for expansion and leaves a lot to user error. I figured an `ls` with string manipulation would be more efficient. Thank you for the website though! Great tool! – Mario Mar 29 '22 at 14:29
  • If you can post a **minimal** script that has a clean shellcheck (even with `# shellcheck disable` comments for specific issues you've considered and discounted) then you'll definitely get a quick answer to your question. Otherwise you're relying on finding someone willing to wade through all that code, assume the real error isn't in some part of the code we can't see, assume none of the fundamental issues are actually causing your problem, etc. Don't post links or images - just a minimal but complete example without shellcheck-identifiable issues that we can execute to reproduce your problem. – Ed Morton Mar 29 '22 at 14:35
  • Regarding `After applying the fixes suggested, a ton of other things went wrong with reading some variables and division by 0 etc.` - that indicates there are additional problems in your script that shellcheck couldn't detect, fix those too and then let us know if you still have a problem. – Ed Morton Mar 29 '22 at 14:40
  • @EdMorton I thought so too, that this might need some serious work. Unfortunately, this is also my final year project / dissertation, so time is critical and something I don't have at the time. I'll definitely open a branch and an issue as this is a project I plan on carrying with me long after my university degree. For now, though, I just want to figure out the regex issue (though I do appreciate that some underlying issue might be causing it to fail). – Mario Mar 29 '22 at 14:45
  • You're assuming you have a regexp issue. That seems extremely unlikely given you have the same regexp in working code. There's probably some other issue elsewhere in your script, e.g. a mismatched quote or unquoted variable **or something else shellcheck can find**,, and it's just by the time the interpreter gets to the line the error message is coming from that it figures out there's a problem but without seeing a clean [mcve] there's not much we can do to help you debug it. – Ed Morton Mar 29 '22 at 14:52
  • 1
    @KamilCuk this is from running `man bash | grep pattern` ``` *(pattern-list) Matches zero or more occurrences of the given patterns ``` – Mario Mar 29 '22 at 14:52
  • @EdMorton that makes sense. I'll try to create something that is clean and passes shellcheck and update my question accordingly. – Mario Mar 29 '22 at 14:53
  • Sounds good and please make sure it's **minimal**, i.e. no more than about 20 lines. Using divide and conquer on your existing script might be a good place to start. I bet you figure out the problem yourself just by going through that exercise. – Ed Morton Mar 29 '22 at 14:54

2 Answers2

3
1. Do not parse ls

For example, I couldn't make this work:

for language in $(
    ls -l --group-directories-first |
    head -n $(ls -l | awk '{print $1}' | grep d | wc -l) |
    awk '{print $9}'
)
do
    # ...
done

Parsing ls is a lot of things except a good practice. It's simpler and 100% accurate to use globs (with post-filtering when needed).

  • For files:
#!/bin/bash
shopt -s nullglob

for file in ./*
do
    [[ -f "$file" ]] || continue
    # ...
done
  • For directories (even simpler):
#!/bin/bash
shopt -s nullglob

for dir in ./*/
do
    # ...
done
2. Use Makefiles

Having compilation details in the script makes it difficult to maintain; you should delegate it to make by creating a Makefile in each algorithm directory.

Here's a basic example of a Makefile for a Go algorithm directory:

build:
    go build -o REPLACE_ME_WITH_THE_ALGORITHM_NAME_run
test:
    go test

OK, let's suppose that you've written the needed Makefiles (with build and test targets); you can now greatly simplify your script:

#!/bin/bash
shopt -s extglob nullglob

for buildpath in "$PROGRAMS_DIR"/@(rust|go|java|c|python|haxe)*(-)*([0-9])/*/
do
    pushd -- "$buildpath" > /dev/null || continue

    make build

    if [[ $TEST == 1 ]]
    then
        make test || exit 1
    fi

    popd > /dev/null || exit 1
done

As a last remark: the variables in your script should be lower case, unless exported or constants

Fravadona
  • 13,917
  • 1
  • 23
  • 35
0

Akcnowledgements


First of, thank you to everyone who suggested a solution, pointing me towards the right direction, and contributed to this project.

@EdMorton - Although shellcheck was a a good resource and pointed many (possible) flaws in my script, some of the fixes it wanted to apply weren't fixes, but rather a warning towards conventional practices which are not necessary for the actual functionality. I've chosen to follow some but not all as some gave unexpected results. As mentioned in the conversation thread on the original question, this points to some unknown underlying issue(s) and a new branch has been pushed in my repository where the benchmark script will be re-written in such a manner that it passes shellcheck.

@Fravadona - Thank you for the suggestion of a Makefile, this was my original decision but due to my inexperience I'm unable to achieve this in the amount of time I have left (this is still my final year dissertation).

Solution currently in place


I've decided that for the time being, applying a conditional regex matching is the way to go as seen in this commit (as seen below). It's by far the most elegant solution, but works well for now and doesn't add any significant complexity.

# Get rid of the leading './'
language=${language:2:${#language}}
# Capture for '-haxe' postfix of a language or any other pattern
# https://stackoverflow.com/a/18710850/5817020
if [[ $language =~ [a-zA-Z]*-[a-zA-Z0-9]* ]]; then
    readarray -d "-" -t LANGUAGE <<< $language
    lang="${LANGUAGE[0]}"
else
    lang=$language
fi
Mario
  • 339
  • 4
  • 17