0

I have bash array (called tenantlist_array below) populated with elements with the following format:

{3 characters}-{3-5 characters}{3-5 digits}-{2 chars}{1-2 digits}.

Example:

abc-hac101-bb0
xyz-b2blo97250-aa99
abc-b2b9912-xy00
fff-hac101-g3

Array elements are unique. Please notice the hyphen, it is part of every array element.

I need to check if the supplied string (used in the below example as a variable tenant) produces a full match with any array element - because array elements are unique, the first match is sufficient.

I am iterating over array elements using the simple code:

tenant="$1"

for k in "${tenantlist_array[@]}"; do
        result=$(grep -x -- "$tenant" <<<"$k")
        if [[ $result ]]; then
            break
        fi
done

Please note - I need to have a full string match - if, for example, the string I am searching is hac101 it must not match any array element even if can be a substring if an array element.

In other words, only the full string abc-hac101-bb0 must produce the match with the first element. Strings abc, abc-hac, b2b, 99, - must not produce the match. That's why -x parameter is with the grep call.

Now, the above code works, but I find it quite slow. I've run it with the array having 193 elements and on an ordinary notebook it takes almost 90 seconds to iterate over the array elements:

real    1m2.541s
user    0m0.500s
sys     0m24.063s

And with the 385 elements in the array, time is following:

real    2m8.618s
user    0m0.906s
sys     0m48.094s

So my question - is there a faster way to do it?

Invisible999
  • 547
  • 2
  • 5
  • 16
  • 1
    If you need a full string match then you don't need to use a regex (even if you do, you don't need to spawn a subshell / use grep). You should be able to simply test for equality `if [[ "$1" = "$k" ]]; then ...` – arco444 Nov 12 '20 at 08:33
  • @arco444 you are correct! Question - do you see any potential drawbacks? – Invisible999 Nov 12 '20 at 10:46
  • No I don't see any drawbacks - the operation is literally intended for this purpose! – arco444 Nov 12 '20 at 11:38

3 Answers3

2

Without running any loop you can do this using glob:

tenant="$1"

[[ $(printf '\3%s\3' "${tenantlist_array[@]}") == *$'\3'"$tenant"$'\3'* ]] &&
echo "ok" || echo "no"

In printf we place a control character \3 around each element and while comparing we make sure to place \3 before & after search key.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Let me try to explain once again because seems there is a misunderstanding. I already have the search string in my hand - I don't need to check if that string confirms the requirement for array elements. What I need is to check if that string matches 1:1 to any array element. – Invisible999 Nov 12 '20 at 10:36
  • I think it was clear in this description: 'I need to check if the supplied string produces a full match with the array element.' – Invisible999 Nov 12 '20 at 10:53
1

Thanks to @arco444, the solution is astonishingly simple:

tenant="$1"

for k in "${tenantlist_array[@]}"; do
        if [[ $k = "$tenant" ]]; then
           result="$k"
        break
        fi
done

And the seed difference for the 385 member array:

real    0m0.007s
user    0m0.000s
sys     0m0.000s

Thousand times faster.

This gives an idea of how wasteful is calling grep, which needs to be avoided, if possible.

Invisible999
  • 547
  • 2
  • 5
  • 16
  • There is typo in the code: it should be `==` not `=`. Moreover, bash is really slow with this kind of stuff... consider to use awk or even easier grep. – Riccardo Petraglia Nov 12 '20 at 10:57
  • @RiccardoPetraglia this is not a typo. See https://stackoverflow.com/questions/20449543/shell-equality-operators-eq. Also please read the question carefully. OP has already stated they have an array in `bash` and using `grep` within the script was causing performance issues. – arco444 Nov 12 '20 at 11:38
  • @arco444 Thank you for pointing me to the explanation about `=` and `==` that was very helpful! About the `grep`: their method is slow because they are using grep in a completely wrong way. I am going to add an answer with the way I would use grep. – Riccardo Petraglia Nov 13 '20 at 13:22
0

This is an alternative way of using grep that actually uses grep at most of its power.

The code to "format" the array could be completely removed just appending a \n at the end of each uuid string when creating the array the first time.

This code would also degrade much slower with the length of the strings that are compared and with the length of the array.

tenant="$1"

formatted_array=""
for k in "${tenantlist_array[@]}"; do
        formatted_array="$formatted_array $i\n"
done

result=$(echo -e "$formatted_array" | grep $tenant)
Riccardo Petraglia
  • 1,943
  • 1
  • 13
  • 25
  • 1
    Couple of problems here. 1. There's very little point in looping the array just for the purposes of constructing a string so you can `grep` it, you may as well do the check for the wanted value in the loop and break, as OP is already doing. 2. `$result` will contain the entire string, not the desired element. If you want a `grep` solution, you can skip the loop and say `result=$(echo -e ${tenantlist_array[@]} | grep -o "$tenant")` – arco444 Nov 13 '20 at 13:32
  • @arco444 ok for point 1. but, as I said in the answer, you could build the array with a `\n` suffix and I think we both agree on the fact that the performance of `grep` are superior to the performance of comparing strings with bash. About the point 2, This is probably the best and fastest solution. – Riccardo Petraglia Nov 13 '20 at 13:39
  • I certainly don't agree about any superior performance in this case - i.e. spawning a subshell to run a process to check the value of a variable that is already in the memory of the parent process. Doing the check without grep is almost certainly faster. Regardless, any superior performance is still totally lost by using a loop and not exiting it at the earliest possible opportunity – arco444 Nov 13 '20 at 13:49
  • I said: "the performance of grep are superior to the performance of comparing strings with bash". You can do any test you like about that... it could depend on the length of the string and on the length of the array but... come on... You cannot really argue this. – Riccardo Petraglia Nov 13 '20 at 13:56
  • "Regardless, any superior performance is still totally lost by using a loop and not exiting it at the earliest possible opportunity" If you are referring to my loop to add the `\n` that is why I said that using `grep` in the way you proposed was the best solution. Still, if you can add the `\n` when building the array the performances will be better than comparing strings with bash (for "long enough" strings). – Riccardo Petraglia Nov 13 '20 at 14:00
  • I think the result I've posted when I eliminated grep call speaks for itself. I am dealing with a 'just' 300 member array, but in the real environment, it might be several thousand array members and the string I am going to match might be on the first element. I need the code run fast and anything towards this goal is better. – Invisible999 Nov 13 '20 at 15:43
  • @Invisible999 `grep` is most of the times more efficient than a bash comparison, especially if quantity are large (otherwise grep would just be a bash script instead of a complex c software). As arco444 suggested, try `result=$(echo -e ${tenantlist_array[@]} | grep -o "$tenant")` instead of the if... – Riccardo Petraglia Nov 13 '20 at 17:50