-1

The proposal is to be a pure bash function for splitting strings that accepts any string as a delimiter and any string as an input.

QUESTION: How to create a function for splitting strings that accepts any string as input and as delimiter?

!!!REASON FOR QUESTION!!! There are many, many proposals (see this example) for string splitting with bash commands, but almost all of them only work in specific cases and not according to our proposal.

NOTES: We consider the following Linux distributions in their latest versions to be eligible as compatible plataforms -> Debiam, Ubuntu (server and desktop), Arch, RedHat, CentOS, SUSE (server and desktop).

Thanks and be kind!

SOME INPUT TO TEST:

read -r -d '' FILE_CONTENT << 'HEREDOC'
BEGIN

§\\§[+][.][-]
A literal backslash, ‘\’.°

°\a
The “alert” character, Ctrl-g, ASCII code 7 (BEL). (This often makes some sort of audible noise.)

\b
Backspace, Ctrl-h, ASCII code 8 (BS).

\f
Formfeed, Ctrl-l, ASCII code 12 (FF).

\n
Newline, Ctrl-j, ASCII code 10 (LF).

\r
Carriage return, Ctrl-m, ASCII code 13 (CR).

\t
Horizontal TAB, Ctrl-i, ASCII code 9 (HT).

\v
Vertical TAB, Ctrl-k, ASCII code 11 (VT).-

\nnn
The octal value nnn, where nnn stands for 1 to 3 digits between ‘0’ and ‘7’. For example, the code for the ASCII ESC (escape) character is ‘\033’.


15

It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – 
fbicknel
 Aug 18, 2017 at 15:57
4

Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }' ./  and eliminate that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray – 
dawg
 Nov 26, 2017 at 22:28 
10

Wow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! – 
artfulrobot
 May 14, 2018 at 11:32
11

I'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) – 
paxdiablo
 Jan 9, 2020 at 12:31
44

This is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"§$
END
HEREDOC
F_MS_STR_TO_SPLIT="${FILE_CONTENT:6:-3}"
F_MS_DELIMITER_P="int }' ./  and eliminate"
f_my_answer "$F_MS_STR_TO_SPLIT" "$F_MS_DELIMITER_P"
f_my_answer "$F_MS_STR_TO_SPLIT" "."
f_my_answer "$F_MS_STR_TO_SPLIT" "+"
f_my_answer "$F_MS_STR_TO_SPLIT" "'"
f_my_answer "$F_MS_STR_TO_SPLIT" "\\"
f_my_answer "$F_MS_STR_TO_SPLIT" "-"
f_my_answer "a.+b.+c" "[.][+]"
f_my_answer "a[.][+]b[.][+]c" "[.][+]"
f_my_answer "a.+b.+c" ".+"
Eduardo Lucio
  • 1,771
  • 2
  • 25
  • 43
  • Please do not misrepresent the work of others, especially those who offer it at no cost to anyone. If the thread has a problem, point it out and we'll fix it as we've done so far. **Be kind!** – Eduardo Lucio Aug 06 '22 at 21:41
  • 1
    Why do you need awk? `split(){ out=(); local p; local s="$1"; while p="${s%%"$2"*}"; out+=("$p"); s="${s:${#p}}"; ((${#s})); do s="${s:${#2}}"; done; declare -p out; }` – jhnc Aug 06 '22 at 21:55
  • @jhnc The reasons are in the thread itself... For me, awk is an excellent solution. You can propose others as an answer, as long as you meet the requirements (even without awk). For me, no answer with "pure" bash solved it. – Eduardo Lucio Aug 06 '22 at 22:12
  • 1
    If you find string or delimiter that breaks my code, I'll be impressed. – jhnc Aug 06 '22 at 22:19
  • @jhnc I also modified the question so that this answer can help as many people as possible. – Eduardo Lucio Aug 07 '22 at 02:02
  • @jhnc Man I haven't been able to find an answer like that in years! I took many, many tests and they all passed! I honestly don't know what planet you got this skill from, but it turned out to be **more than excellent!** I modified your answer a bit and posted it below. But, I confess that I couldn't quite understand how it works. Thanks! – Eduardo Lucio Aug 07 '22 at 02:02
  • 1
    Glad you have it working now, but I can recall seeing at least one, and probably two different versions of this same question over the past 2 or 3 days. It's much better to edit the original so that all answers are associated with a single question, than to delete and re-ask differing versions of the same question. Advice going forward. – David C. Rankin Aug 07 '22 at 03:52
  • @DavidC.Rankin I made all the changes they asked for, but there was no answer and the question was still closed. So I preferred to delete the originals and create new ones. Then the thing started to happen. Just to justify myself. Thanks! – Eduardo Lucio Aug 07 '22 at 04:23
  • 2
    That's okay, I know for me it was just a little confusing to see more than one version of the question go by. Glad you got it sorted out. – David C. Rankin Aug 07 '22 at 04:24

1 Answers1

0

There are many, many proposals for string splitting with bash commands, but almost all of them only work in specific cases and not accepts any string as input and as delimiter.

The function below, created by jhnc and modified by me, accepts any string as input and as delimiter.

FUNCTION

declare -a F_MASTER_SPLITTER_R; 
f_master_splitter(){
    : 'Split a given string and returns an array.

    Args:
        F_MS_STR_TO_SPLIT (str): String to split.
        F_MS_DELIMITER_P (Optional[str]): Delimiter used to split. If not informed
    the split will be done by spaces.

    Returns:
        F_MASTER_SPLITTER_R (array): Array with the provided string separated by
    the informed delimiter.
    '

    local F_MS_STR_TO_SPLIT="$1"
    local F_MS_DELIMITER_P="$2"
    if [ -z "$F_MS_DELIMITER_P" ] ; then
        F_MS_DELIMITER_P=" "
    fi
    F_MASTER_SPLITTER_R=();
    local F_MS_ITEM=""
    while
        F_MS_ITEM="${F_MS_STR_TO_SPLIT%%"$F_MS_DELIMITER_P"*}"
        F_MASTER_SPLITTER_R+=("$F_MS_ITEM")
        F_MS_STR_TO_SPLIT="${F_MS_STR_TO_SPLIT:${#F_MS_ITEM}}"
        ((${#F_MS_STR_TO_SPLIT}))
    do
        F_MS_STR_TO_SPLIT="${F_MS_STR_TO_SPLIT:${#2}}"
    done
}

USAGE

f_master_splitter "<STR_INPUT>" "<STR_DELIMITER>"

NOTE: The f_master_splitter above was made available completely free as part of this project ez_i - Create shell script installers easily!.


TO TEST (MORE ELABORATE)

read -r -d '' FILE_CONTENT << 'HEREDOC'
BEGIN

§\\§[+][.][-]
A literal backslash, ‘\’.°

°\a
The “alert” character, Ctrl-g, ASCII code 7 (BEL). (This often makes some sort of audible noise.)

\b
Backspace, Ctrl-h, ASCII code 8 (BS).

\f
Formfeed, Ctrl-l, ASCII code 12 (FF).

\n
Newline, Ctrl-j, ASCII code 10 (LF).

\r
Carriage return, Ctrl-m, ASCII code 13 (CR).

\t
Horizontal TAB, Ctrl-i, ASCII code 9 (HT).

\v
Vertical TAB, Ctrl-k, ASCII code 11 (VT).-''%s

\nnn
The octal value nnn, where nnn stands for 1 to 3 digits between ‘0’ and ‘7’. For example, the code for the ASCII ESC (escape) character is ‘\033’.


15

It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – 
fbicknel
 Aug 18, 2017 at 15:57
4

Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }' %s./ \"and eliminate+.-°\a“\b\f\n\r\t\v\nnn‘’\033`` that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray – 
dawg
 Nov 26, 2017 at 22:28 
10

Wow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! – 
artfulrobot
 May 14, 2018 at 11:32
11

I'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) – 
paxdiablo
 Jan 9, 2020 at 12:31
44

This is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"§$
END
HEREDOC
F_MS_STR_TO_SPLIT="${FILE_CONTENT:6:-3}"
F_MS_DELIMITER_P="int }' %s./ \\\"and eliminate+.-°\a“\b\f\n\r\t\v\nnn‘’\033\`\`"

f_print_my_array() {
    LENGTH=${#F_MASTER_SPLITTER_R[*]}
    for ((i=0;i<=$(($LENGTH-1));i++)); do
        echo ">>>>>>>>>>"
        echo "${F_MASTER_SPLITTER_R[$i]}"
        echo "<<<<<<<<<<"
    done
}

echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "$F_MS_STR_TO_SPLIT" "$F_MS_DELIMITER_P"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "$F_MS_STR_TO_SPLIT" "."
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "$F_MS_STR_TO_SPLIT" "+"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "$F_MS_STR_TO_SPLIT" "'"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "$F_MS_STR_TO_SPLIT" "\\"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "$F_MS_STR_TO_SPLIT" "-"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "a.+b.+c" "[.][+]"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "a[.][+]b[.][+]c" "[.][+]"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>>>>>"
f_master_splitter "a.+b.+c" ".+"
f_print_my_array
echo "<<<<<<<<<<<<<<<<<<<<"

A very special thanks to jhnc! You rock!

Eduardo Lucio
  • 1,771
  • 2
  • 25
  • 43
  • 1
    Your function doesn't work with the sample string you provide in the question. The very first characters should be: `§\\§` but your function outputs `§\§`. And it loses `$string`. – jhnc Aug 06 '22 at 22:28
  • You were right! I adapted your answer here. **Thanks!** – Eduardo Lucio Aug 07 '22 at 01:57
  • 1
    I already posted my code here: https://stackoverflow.com/a/73225463/10971581 – jhnc Aug 07 '22 at 02:09