The proposal is to be a function using only awk for splitting strings that accepts any string as a delimiter and any string as an input.
How to create a function for split that uses only awk and accepts any string as input and as delimiter?
There are many, many proposals (see this example) for string splitting with bash commands, but all of them only work in specific cases and not according to our proposal.
We decided to present our code as an example, but despite being fully functional, there are several points that we think can be improved/adjusted/corrected.
Example function (f_split)
F_PRESERVE_BLANK_LINES_R=""
f_preserve_blank_lines() {
: 'Remove "single quotes" used to prevent blank lines being erroneously removed.
The "single quotes" are used at the beginning and end of the strings to prevent
blank lines with no other characters in the sequence being erroneously removed.
We do not know the reason for this side effect. This problem occurs, for example,
in commands that involve "awk".
Args:
STR_TO_TREAT_P (str): String to be treated.
Returns:
F_PRESERVE_BLANK_LINES_R (str): String treated.
'
F_PRESERVE_BLANK_LINES_R=""
STR_TO_TREAT_P=$1
STR_TO_TREAT_P=${STR_TO_TREAT_P%?}
F_PRESERVE_BLANK_LINES_R=${STR_TO_TREAT_P#?}
}
F_SPLIT_R=()
f_split() {
: 'It does a "split" into a given string and returns an array.
Args:
TARGET_P (str): Target string to "split".
DELIMITER_P (Optional[str]): Delimiter used to "split". If not informed the
split will be done by spaces.
Returns:
F_SPLIT_R (array): Array with the provided string separated by the informed
delimiter.
'
F_SPLIT_R=()
TARGET_P=$1
DELIMITER_P=$2
if [ -z "$DELIMITER_P" ] ; then
DELIMITER_P=" "
fi
REMOVE_N=1
if [ "$DELIMITER_P" == "\n" ] ; then
REMOVE_N=0
fi
# PROBLEM: This was the only parameter that has been a problem so far... There are
# probably others. Maybe a scheme using "sed" would solve the problem...
if [ "$DELIMITER_P" == "./" ] ; then
DELIMITER_P="[.]/"
fi
if [ ${REMOVE_N} -eq 1 ] ; then
# PROBLEM: Due to certain limitations we have some problems getting the output
# of a split by awk inside an array and so we need to use "line break" (\n)
# to succeed. Seen this, we remove the line breaks momentarily afterwards
# we reintegrate them. The problem is that if there is a line break in the
# "string" informed, this line break will be lost, that is, it is erroneously
# removed in the output...
TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")
fi
# PROBLEM: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results in
# more occurrences of "3F2C417D448C46918289218B7337FCAF" than the amount of "\n"
# that there was originally in the string (one more occurrence at the end of
# the string). We can not explain the reason for this side effect. The line below
# corrects this problem...
TARGET_P=${TARGET_P%????????????????????????????????}
SPLIT_NOW=$(awk -F "$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")
while IFS= read -r LINE_NOW ; do
if [ ${REMOVE_N} -eq 1 ] ; then
LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")
# PROBLEM: It would be perfect if we didn't need to use the function below...
f_preserve_blank_lines "$LN_NOW_WITH_N"
LN_NOW_WITH_N="$F_PRESERVE_BLANK_LINES_R"
F_SPLIT_R+=("$LN_NOW_WITH_N")
else
F_SPLIT_R+=("$LINE_NOW")
fi
done <<< "$SPLIT_NOW"
}
Usage
read -r -d '' FILE_CONTENT << 'HEREDOC'
BEGIN
15
It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. –
fbicknel
Aug 18, 2017 at 15:57
4
Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }' ./ and eliminate that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray –
dawg
Nov 26, 2017 at 22:28
10
Wow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! –
artfulrobot
May 14, 2018 at 11:32
11
I'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) –
paxdiablo
Jan 9, 2020 at 12:31
44
This is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"
END
HEREDOC
FILE_CONTENT="${FILE_CONTENT:6:-3}"
DELIMITER_P="int }' ./ and eliminate"
f_split "$FILE_CONTENT" "$DELIMITER_P"
LENGTH=${#F_SPLIT_R[*]}
for ((i=0;i<=$(($LENGTH-1));i++)); do
echo ">>>>>>>>>>"
echo "${F_SPLIT_R[$i]}"
echo "<<<<<<<<<<"
done