Getting whitespace characters from grep into an array

Question

Where I started:

input="a s d f"
content=(`grep -o . <<< "$input"`)
echo ${#content[@]}
for ((a = 0; a < ${#content[@]}; a++)); do
    token="${content[a]}"
    echo "$token"
done
read -p ''

echoes:

4
a
s
d
f

Although the grep command is capturing white space, when the array is constructed the whitespace characters are being lost. Presumably because spaces are the separation characters when defining an array.

What I want:

content=({a,\ ,s,\ ,d,\ ,f})
echo ${#content[@]}
for ((a = 0; a < ${#content[@]}; a++)); do
    token="${content[a]}"
    echo "$token"
done
read -p ''

echoes:

7
a

s

d

f

The array length is 7 and the spaces are stored as their own characters. This is what I'm trying to get. However, in this example the input is hard coded in. I'm trying to reach this point from any input string.

What I have:

input="a s d f"
content=({`grep -o . <<< "$input" | sed 's/ /\\\ /g' | sed 's/.*/&,/g'`})
echo ${#content[@]}
for ((a = 0; a < ${#content[@]}; a++)); do
    token="${content[a]}"
    echo "$token"
done
read -p ''

echoes:

10
{a,
\
,
s,
\
,
d,
\
,
f,}

So I tried to use sed to reformat the way the grep output is returned, that way it would match the pattern in my second example. As you can see this did not work the way I expected.

My Question:

How can I get the result in my second example while still using an input variable to construct the array? Am I just making a stupid mistake? Is this just a bad way to do this in general? Any help would be appreciated.

@shelter no, `grep -o .` will split the string into characters, putting every char onto it's own line. — Ed Morton, Dec 13 '17 at 00:11
Also note that you don't need brace expansion `content=({a,\ ,s,\ ,d,\ ,f})` to assign a space into an array. You could do this for example: `content=(a " " s " " d " " f)` or `content=(a \ s \ d \ f)`. — PesaThe, Dec 13 '17 at 01:11

PesaThe · Accepted Answer · 2017-12-17T16:41:42.783

First thing that comes to mind (works well with <newline> as well):

input="a s d f"
content=()
for ((i=0; i<${#input}; i++)); do 
   content+=("${input:$i:1}")
done

echo ${#content[@]}
printf '%s\n' "${content[@]}"

Outputs:

7
a

s

d

f

Other ways to do this (shorter but these will ignore any <newline> in $input):

set -f #to prevent globbing
old_IFS=$IFS
IFS=$'\n'
content=($(grep -o . <<< "$input"))
IFS=$old_IFS

IFS=$'\n' read -r -a content -d '' < <(grep -o . <<< "$input")

readarray -t content < <(grep -o . <<< "$input")

Other ways that will not ignore <newline>:

content=()
while IFS= read -r -d '' char; do
   content+=("$char")
done < <(grep -z -o . <<< "$input")
unset "content[${#content[@]}-1]" #trims the final newline

Better version of readarray (you need bash 4.4) solution suggested by @EdMorton:

readarray -d '' -t content < <(grep -z -o . <<< "$input")
unset "content[${#content[@]}-1]" #trims the final newline

ghoti · Answer 2 · 2017-12-13T13:46:38.443

You can change how word splitting works by adjusting IFS, and you can capture single characters with read -n 1. For example:

$ input="a b c d"
$ while IFS= read -n 1 token; do echo "> _${token}_"; done <<<"$input"
> _a_
> _ _
> _b_
> _ _
> _c_
> _ _
> _d_
> __

The final blank is shown because heretext input redirection (<<<) appends a newline to the input you supply.

If you want to store these characters in an array, you can append as you go, as Pesa suggested...

$ declare -a content=()
$ while IFS= read -n 1 token; do content+=("$token"); done <<<"$input"
$ declare -p content
declare -a content=([0]="a" [1]=" " [2]="b" [3]=" " [4]="c" [5]=" " [6]="d" [7]="")

We can now erase that final newline from the array:

unset content[$((${#content[@]}-1))]

From there, you can format your output with printf however you like, using the array to provide content:

$ printf '%d\n' "${#content[@]}"; printf '%s\n' "${content[@]}"
7
a

b

c

d

Getting whitespace characters from grep into an array

2 Answers2