-3

I have already read posts like How can I store the “find” command results as an array in Bash or Creating an array from a text file in Bash or Store output of command into the array

Now my issue is the following: How to do this in parallel?

Background:

I have a script for processing a large git repository with a lot of submodules and perform certain actions within these. Sometimes there are some tasks that take a while so meanwhile I want to give some user feedback to indicate that something is still happening and the code isn't just stuck ^^

I have a function

function ShowSpinner()
{
    pid=$!
    while [ -d /proc/$pid ]
    do
        for x in '-' '/' '|' '\\'
        do
            echo -ne ${x}" \r"
            sleep 0.1
        done
    done
}

for displaying a little spinner while doing long tasks. And so far currently I use this e.g. like

while IFS= read -r line
do
    # Some further processing of the output lines here
done <<< $(git pull 2>&1) & ShowSpinner

which works fine and always displays the spinner until the task is finished.

In particular I use this also for finding submodules in a git repository like

function FindSubmodules()
{
    # find all .git FILES and write the result to the temporary file .submodules
    find -name ".git" -type f > .submodules & ShowSpinner
    # read in the temporary file
    SUBMODULES=$(cat .submodules)
    # and delete the temporary file
    rm .submodules
}

later I iterate the submodules using e.g.

function DoSomethingWith()
{
    for submodule in ${SUBMODULES}
    do
        echo $submodule
    done
}

FindSubmodules

DoSomethingWith

Of course I do more stuff in there, this is only a short example.

This works find, but what I don't like here is that this file .submodules is created (and if only temporary). I would prefer to directly store the result in an array and then iterate that one directly.

So after reading mentioned posts I tried to use something like simply

IFS=$'\n'
SUBMODULES=( $(find -name ".git" -type f)) & ShowSpinner

or from the links also

readarray SUBMODULES < <(find -name ".git" -type f) & ShowSpinner

or

readarray -t SUBMODULES "$(find -name ".git" -type f)" & ShowSpinner

and then iterate like

for submodule in ${SUBMODULES [@]}
do
    echo $submodule
done

For all three options the result is basically the same: The spinner works fine but all that I get using this is one single entry with the last char of the ShowSpinner instead of the results of find. Without the & ShowSpinner it works fine but of course doesn't show any feedback of a long tasks.

What am I doing wrong? How can I get the readarray to work in parallel with the ShowSpinner function?


Update as suggested I have put it to a function (actually I already had functions just didn't put the spinner behind the entire function so far)

function FindSubmodules()
{
    echo ""
    echo ${BOLD}"Scanning for Submodules ...  "${NORMAL}
    
    SUBMODULES=($(find -name ".git" -type f))
    
    for submodule in "${SUBMODULES[@]}"
    do
        echo $submodule
    done
}

function CheckAllReposForChanges()
{
    # Check Submodules first
    for submodule in "${SUBMODULES[@]}"
    do
        # remove prefixed '.'
        local removedPrefix=${submodule#.}
        # remove suffix '.git'
        local removedSuffix=${removedPrefix%.git}

        echo "${BASEPATH}${removedSuffix}"
    done
    
    # Check the main repo itself
    echo "${BASEPATH}"

    echo ""
}

FindSubmodules & ShowSpinner

CheckAllReposForChanges

the CheckRepoForChanges function itself works just fine.

What I get now is the spinner and then the correct output from the first FindSubmodules like e.g.

./SomeFolder/.git
./SomeOtherFolder/.git
./SomeThirdFolder/.git

etc

However when it comes to the CheckAllReposForChanges (again the echo is just an example for debugging) I don't get any output except the main repository path. It seems like now SUBMODULES is empty since it is being filled in the background. It worked with the solution I used originally.

derHugo
  • 83,094
  • 9
  • 75
  • 115
  • Nothing here seems to require an array anyway. I don't see how you can get the result you report. Please provide a [mre]. – tripleee Oct 15 '20 at 10:16
  • In any Git repo I am familiar with, `.git` will always be a directory. So your `find` command simply doesn't produce any output. – tripleee Oct 15 '20 at 10:23
  • @tripleee it sounds like you have never worked with git [submodules](https://www.git-scm.com/book/en/v2/Git-Tools-Submodules) so far. Each submodule has a **file** `.git` so this is a minimal reproducible example. Anyway this isn't really relevant. If you want you can replace `.git` by any other file name. As said .. it basically works without the `ShowSpinner` .. by that I ment it works and I get my submodules listed (about 25) – derHugo Oct 15 '20 at 10:28
  • The part I can't see how to repro is where you say you get the last character from the spinner. *Where* do you get that result; from running what code exactly? – tripleee Oct 15 '20 at 10:34
  • @tripleee anything using that array later like `for submodule in ${SUBMODULES [@]}` etc .. it only echos one `/` or `|` etc depending what the last char of the spinner was ;) – derHugo Oct 15 '20 at 10:36
  • For something this complex, I'd use Python. – devinbost Oct 25 '21 at 20:51

3 Answers3

0

Of course the array is empty; you have backgrounded the function which will eventually populate it (and anyway, there is no way really for the background process to populate a variable in its parent once it finishes).

Run both functions in the same background process and they will be able to communicate properly.

{ FindSubmodules
  CheckAllReposForChanges
} & ShowSpinner
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • As a further aside, don't use upper case for your private variables; see https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization – tripleee Oct 15 '20 at 11:17
  • unfortunately also this is not an option. I don't need/want the spinner all the time. The scripts contains 8 phases more where internally at different points I use a spinner where needed (as shown in the question e.g. during `git pull`) .. wrapping everything with a spinner is not a solution unfortunately. I appreciate your efforts and inputs a lot though! Will consider the naming as well – derHugo Oct 15 '20 at 12:51
  • So make the background processes communicate to the parent when the spinner should or should not run. Again, your code and your question fail to reveal your actual requirements. Maybe accept this answer (which at the very least explains what's wrong) and ask a new question with your *actual* requirements, and hopefully finally a representative [mre]. – tripleee Oct 15 '20 at 13:00
0

Maybe I'm misreading the question but it seems (to me) the requirement is to pass data 'up' from a backgrounded/child process to the calling/parent process.

A backgrounded script/function call spawns a new, asynchronous OS-level process; there is no easy way to pass data 'up' from the child process to the parent process.

While it may be possible to build some sort of inter-process shared memory structure to share data between parent and child processes, it's a bit easier if we can use some sort of intermediate storage (eg, fifo, file, database table, queuing system, etc) that the various processes can 'share'.

One idea:

  • parent process creates one or more temp directories (eg, one for each distinct array to be populated)
  • each child process writes data to a file (filename = ${BASHPID}) in a particular temp directory in a format that can be easily parsed (and loaded into an array) by the parent
  • the parent calls the child process, waits for the child process to complete, and then ...
  • the parent process reads the contents of all files in the temporary directory(s) and loads the appropriate array(s)

For sake of an example I'll assume we just need to populate a single array; I'm also going to use the same temp directory for capturing/storing modules for the functions regardless of whether each function is run in the background or foreground:

unset submodules                                     # delete any variable with this name
submodules=()                                        # init array

outdir=$(mktemp -d)                                  # create temp directory

FindSubmodules()
{
    ... snip ...
        echo "$submodule" >> "${outdir}/${BASHPID}"  # write module to temp file
    ... snip ...
}

CheckAllReposForChanges()
{
    ... snip ...
        echo "$submodule" >> "${outdir}/${BASHPID}"  # write module to temp file
    ... snip ...
}

FindSubmodules & ShowSpinner

CheckAllReposForChanges

# now pull modules from temp file(s) into array; NOTE: assumes each temp file contains a single module name on each line, and no blank lines, otherwise OP can add some logic to address a different format

while read -r modname
do
    submodules+=(${modname})
done < <(cat "${outdir}"/[0-9]*)

# remove temp directory and file(s)

'rm' -rf ${outdir}
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • hmm maybe the question is a bit misleading due to the update at the bottom I tried after it was suggested by a now deleted answer. Originally both methods do not run in a background but rather only `find`. You now suggest to create a dictionary and multiple files instead of me originally only wanting to prevent to write and read from a single file but rather directly into an array variable while being able to do it in background – derHugo Oct 15 '20 at 13:24
  • early in the question you ask: "*How to do this in parallel?*" which implies (to me) that at some point something will need to be kicked off in the background; the key point of my answer is there's no easy way to pass data 'up' from an asynchronous backgrounded child process to the parent process; as for the rest of the (lengthy) question ... yeah, I'm still not 100% sure what the question is :-) – markp-fuso Oct 15 '20 at 13:52
  • in short my question goes: `find -name ".git" -type f > .submodules & ShowSpinner; SUBMODULES=$(cat .submodules); rm .submodules` works, showing a spinner and filling the variable. But I want to do the same with a spinner but without creating any additional files ^^ Honestly I'm in a state where I'ld like to delete the question but I can't since there are answers here :D – derHugo Oct 15 '20 at 13:58
  • to run a `find` command and the 'spinner' at the same time requires 2x separate processes, which means at least one of those 2x processes has to be run in the background, which means the 'background' process is spawned as an asynchronous OS-level call, at which point you have to come up with a means of passing data between 2x disparate OS processes (eg, shared memory construct, file, pipe, etc); what's the issue with creating a file? – markp-fuso Oct 15 '20 at 14:09
  • The issue is the file is added to that git repo .. only for very short but enough to force the git IDE (in our case SourceTree) to refresh the entire repo state which takes a while and eats resources. I understand now where the problem lies and it seems the file for now is the best option I have ^^ Anyway thanks for your effort and input! – derHugo Oct 15 '20 at 14:18
  • wouldn't this be as 'easy' as placing the temporary file somewhere outside the boundaries of what the git IDE sees/has-access-to? based on your latest comment it sounds like the **real** issue is extracting the `find` results, via a temporary file (due to backgrounding the `find`), while making sure the creation of the temporary file does not trigger an update of the git repository ... ? – markp-fuso Oct 15 '20 at 14:20
  • That could of course be an option but I prefer to keep the repo sandboxed. This is e.g. also running on Jenkins and I don't want to create new files out of that workspace so in case of an error it would stay there lying around somewhere – derHugo Oct 15 '20 at 14:23
0

If you can write your parallelism using GNU Parallel, you can use parset:

dostuff() {
  #  Do real stuff here
  sleep 10;
}
export -f dostuff

function ShowSpinner()
{
    while [ -d /proc/$pid ]
    do
        for x in '-' '/' '|' '\\'
        do
            echo -ne ${x}" \r"
            sleep 0.1
        done
    done
}

sleep 1000000 &
pid=$!
ShowSpinner &
parset myout dostuff < <(find -name ".git" -type f)
kill $pid
echo

Or (if you are willing to change ShowSpinner:

dostuff() {
  #  Do real stuff here
  sleep 10;
}
export -f dostuff

function ShowSpinner()
{
    while true; do
        for x in '-' '/' '|' '\\'
        do
            echo -ne ${x}" \r"
            sleep 0.1
        done
    done
}

ShowSpinner &
pid=$!
parset myout dostuff < <(find -name ".git" -type f)
kill $pid
echo
Ole Tange
  • 31,768
  • 5
  • 86
  • 104