read csv output into an array and process the variable in a loop using bash

Question

assuming i have an output/file

1,a,info
2,b,inf
3,c,in

I want to run a while loop with read

while read r ; do 
   echo "$r";
   # extract line to $arr as array separated by ',' 
   # call some program (e.g. md5sum, echo ...) on one item of arr
done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

I would like to use readarray and while, but compelling alternatives are welcome too.

There is a specific way to have readarray (mapfile) behave correctly with process substitution, but i keep forgetting it. this is intended as a Q&A so an explanation would be nice

Regarding `I specifically want to use readarray and while` - why? — Ed Morton, Jan 16 '23 at 15:01
@EdMorton because i wanted to have a simple resource to go to that gave away the answer quickly to this particular problem . Anyway. I accepted your answer. The better solution is the better solution. If you have a way to "retroactive continuity"-ing my question so that it fits better with all the answers , please go for it. though maybe it is good that people can find an alternative solution to what they were searching for. — Summer-Sky, Jan 18 '23 at 15:11
the close is unwarranted because the "duplicate" question does not come up in a search for read , array and while loop. also I was specifically asking for readarray and the special ways this behaves! it doesn't matter if there is a better way around it ! — Summer-Sky, Jan 23 '23 at 09:21

Ed Morton · Accepted Answer · 2023-01-16T14:57:05.960

2

Since compelling alternatives are welcome too and assuming you're just trying to populate arr one line at a time:

$ cat tst.sh
#!/usr/bin/env bash

while IFS=',' read -a arr ; do
    # extract line to $arr as array separated by ','
    # echo the first item of arr
    echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

$ ./tst.sh
1
2
3

or if you also need each whole input line in a separate variable r:

$ cat tst.sh
#!/usr/bin/env bash

while IFS= read -r r ; do
    # extract line to $arr as array separated by ','
    # echo the first item of arr
    IFS=',' read -r -a arr <<< "$r"
    echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

$ ./tst.sh
1
2
3

but bear in mind why-is-using-a-shell-loop-to-process-text-considered-bad-practice anyway.

edited Jan 16 '23 at 14:57

answered Jan 16 '23 at 14:26

Ed Morton

188,023
17
78
185

1

Thanks. great answer. I actually tried this but my IFS was in front of while and it didn't work. moving it to the read is the trick here. – Summer-Sky Jan 18 '23 at 14:26
PS: i read that https://unix.stackexchange.com/questions/169716, but it gives no alternative. Also it assumes that one only tries to convert text into another form of text (awk) ... sometimes you have some CSV-ish lists that give you input for programs to run. Programs that sadly have difficult ~plang~ bindings for a simple task (e.g. resize and rename a picture) .. that's where you quickly need the convoluted syntax below for the simplest API to do the task quickly (with little human involvement), no matter how long it will take computationally – Summer-Sky Jan 18 '23 at 15:36
1

The whole Q&A is about using a shell loop to process text, not about using a shell loop to sequence calls to other tools - that latter is why shells have loops. As for alternatives - there are too many to list, including just using grep, sed, awk, etc. depending on what it is you want to do with the input, how it's formatted, etc. – Ed Morton Jan 18 '23 at 15:52
yea, maybe i could have written call with first item of arr. for example `echo` . – Summer-Sky Jan 18 '23 at 15:55
1

I wouldn't chose `echo` for that as it's easily emulated in `awk`, etc., so its not a good example of when you'd need a shell loop reading text. Calling `md5sum` or similar on a part of the input line is one you can't do internally with mandatory POSIX tools so maybe that would be a good example - reading a CSV with 3 fields and outputting it with the 2nd field replaced by it's md5sum. – Ed Morton Jan 18 '23 at 15:58

Jetchisel · Answer 2 · 2023-01-18T16:57:50.083

2

If the loadable builtin csv is available/acceptable, something like:

help csv
csv: csv [-a ARRAY] string
    Read comma-separated fields from a string.
    
    Parse STRING, a line of comma-separated values, into individual fields,
    and store them into the indexed array ARRAYNAME starting at index 0.
    If ARRAYNAME is not supplied, "CSV" is the default array name.

The script.

#!/usr/bin/env bash

enable csv || exit

while IFS= read -r line && csv -a arr "$line"; do
  printf '%s\n' "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC

See help enable

With bash 5.2+ there is a default path for the loadables in config-top.h which should be configurable at compile time.

BASH_LOADABLES_PATH

edited Jan 18 '23 at 16:57

answered Jan 16 '23 at 17:44

Jetchisel

7,493
2
19
18

2

_Very_ nice; TIL. The OP's simple data doesn't call for this, but as soon as they're trying to read `4,d,"info, here"` in their input it becomes _very_ appropriate. – Charles Duffy Jan 16 '23 at 18:16
1

@CharlesDuffy Playing with CSV implie to care about number of columns and potentially read more than one line for one row, if some field do contain *newline* in it! Have a look [How I parse CSV](https://stackoverflow.com/a/69514496/1765658) – F. Hauri - Give Up GitHub Jan 18 '23 at 08:51
how to enable csv in ubuntu bash? `apt install bash-builtins` then `enable -f /usr/lib/bash/csv csv` --- thanks @F.Hauri-GiveUpGitHub – Summer-Sky Jan 18 '23 at 16:39
@Summer-Sky, if `/usr/lib/bash/` is where the loadable builtins are then that should be fine. look at @F.Hauri's post from the links posted, but in version `5.2` there is a compile option where to put it. – Jetchisel Jan 18 '23 at 16:43

F. Hauri - Give Up GitHub · Answer 3 · 2023-01-18T08:40:48.080

`readarray` (`mapfile`) and `read -a` disambiguation

`readarray` == `mapfile` first:

help readarray
readarray: readarray [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
    Read lines from a file into an array variable.
    
    A synonym for `mapfile'.

Then

help mapfile
mapfile: mapfile [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
    Read lines from the standard input into an indexed array variable.
    
    Read lines from the standard input into the indexed array variable ARRAY, or
    from file descriptor FD if the -u option is supplied.  The variable MAPFILE
    is the default ARRAY.
    
    Options:
      -d delim    Use DELIM to terminate lines, instead of newline
      -n count    Copy at most COUNT lines.  If COUNT is 0, all lines are copied
      -O origin   Begin assigning to ARRAY at index ORIGIN.  The default index is 0
      -s count    Discard the first COUNT lines read
      -t  Remove a trailing DELIM from each line read (default newline)
      -u fd       Read lines from file descriptor FD instead of the standard input
      -C callback Evaluate CALLBACK each time QUANTUM lines are read
      -c quantum  Specify the number of lines read between each call to
                          CALLBACK
...

While `read -a`:

help read
read: read [-ers] [-a array] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...]
    Read a line from the standard input and split it into fields.
    
    Reads a single line from the standard input, or from file descriptor FD
    if the -u option is supplied.  The line is split into fields as with word
    splitting, and the first word is assigned to the first NAME, the second
    word to the second NAME, and so on, with any leftover words assigned to
    the last NAME.  Only the characters found in $IFS are recognized as word
    delimiters.
...
    Options:
      -a array    assign the words read to sequential indices of the array
                  variable ARRAY, starting at zero
...

Note:

Only the characters found in $IFS are recognized as word delimiters. Useful with -a flag!

Create an array from a splitted string

For creating an array by splitting a string you could either:

IFS=, read -ra myArray <<<'A,1,spaced string,42'
declare -p myArray

declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")

Oe use mapfile, but as this command is intented to work of whole files, syntax is something counter-intuitive:

mapfile -td, myArray < <(printf %s 'A,1,spaced string,42')
declare -p myArray

declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")

Or, if you want to avoid fork ( < <(printf... ), you have to

mapfile -td, myArray <<<'A,1,spaced string,42'
myArray[-1]=${myArray[-1]%$'\n'}
declare -p myArray

declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")

This will be a little quicker, but not more readable...

For you sample:

mapfile -t rows <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC
for row in ${rows[@]};do
    IFS=, read -a cols <<<"$row"
    declare -p cols
done

declare -a cols=([0]="1" [1]="a" [2]="info")
declare -a cols=([0]="2" [1]="b" [2]="inf")
declare -a cols=([0]="3" [1]="c" [2]="in")

for row in ${rows[@]};do
    IFS=, read -a cols <<<"$row"
    printf ' %s | %s\n' "${cols[0]}" "${cols[2]}"
done

 1 | info
 2 | inf
 3 | in

Or even, if really you want to use readarray:

for row in ${rows[@]};do
    readarray -dt, cols <<<"$row"
    cols[-1]=${cols[-1]%$'\n'}
    declare -p cols
done

declare -a cols=([0]="1,a,info")
declare -a cols=([0]="2,b,inf")
declare -a cols=([0]="3,c,in")

Playing with `callback` option:

(Added some spaces on last line)

testfunc() { 
    local IFS array cnt line
    read cnt line <<< "$@"
    IFS=,
    read -a array <<< "$line"
    printf ' [%3d]: %3s | %3s :: %s\n' $cnt "${array[@]}"
}
mapfile -t -C testfunc -c 1  <<HEREDOC
1,a,info
2,b,inf
3,c d,in fo   
HEREDOC

 [  0]:   1 |   a :: info
 [  1]:   2 |   b :: inf
 [  2]:   3 | c d :: in fo

Same, with `-u` flag:

Open the file descriptor:

exec {mydoc}<<HEREDOC
1,a,info                             
2,b,inf                                                                                        
3,c d,in fo   
HEREDOC

Then

mapfile -u $mydoc -C testfunc -c 1

 [  0]:   1 |   a :: info
 [  1]:   2 |   b :: inf
 [  2]:   3 | c d :: in fo

And finally close the file descriptor:

exec {mydoc}<&-

About bash `csv` module,

For further informations about enable -f /path/to/csv csv, RFCs and limitations, have a look at my previous post about How to parse a CSV file in Bash?

also a great result, but can you move the important part at the top? the code after "For you sample:" ... the rest is a great resource. then i thumbs up too ;) — Summer-Sky, Jan 18 '23 at 14:36
Have a look how I use `readarray -td, myArray ` and `myArray[-1]=${myArray[-1]%$'\n'}` for avoiding fork `<(prinf...)` (You begin your answer by *The solution is...*, in wich I doesn't agree. ;) — F. Hauri - Give Up GitHub, Jan 18 '23 at 16:56
i don't agree in first populating an array with the whole file and then iterating through it. also I don't get why proving that readarray is synonymous to mapfile when the question already implies this. also starting out with the docs is not the order someone expects answers. first the answer, then additional explanations. — Summer-Sky, Jan 23 '23 at 09:39

score 0 · Answer 4 · answered Jan 16 '23 at 11:29

0

The solution is readarray -t -d, arr < <(printf "%s," "$r")

The special part is < <(...) because readarray ....
there is no proper reason to be found why it first needs a redirection arrow and then process-substitution.
Neither in tldp process-sub nor SS64 .
My final understanding is that, <(...) opens a named pipe and readarray is waiting for it to close. By moving this in place of a file behind < it is handled by bash as a file input and (anonymously) piped into stdin.

example:

while read r ; do 
   echo "$r";
   readarray -t -d, arr < <(printf "%s," "$r");
   echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

Anyway this is just a reminder for myself, because i keep forgetting and readarray is the only place where i actually need this.

The question was also answered mostly here, here why the pipe isn't working and somewhat here, but they are difficult to find and the reasoning to comprehend.

for example the shopt -s lastpipe solution is not clear at first, but it turns out that in bash all piped elements are usually not executed in the main shell, thus state changes have no effect on the full program. this command changes the behavior to have the last pipe element execute in main (except in an interactive shell)

shopt -s lastpipe;
while read r ; do 
    echo "$r";       
    printf "%s," "$r"  | readarray -t -d, arr;
    echo "${arr[0]}"; 
    done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

one alternative to lastpipe would be to do all activity in the sub shell:

while read r ; do 
       echo "$r";
       printf "%s," "$r"  | { 
            readarray -t -d, arr ; 
            echo "${arr[0]}"; 
       }
    done <<HEREDOC
1,a,info
2,b,inf
3,c,in   
HEREDOC

answered Jan 16 '23 at 11:29

Summer-Sky

463
8
21

2

`while read r ; do` needs to be `while IFS= read -r r ; do` unless you have a specific need to translate escape sequences and strip leading/trailing blanks from the input. – Ed Morton Jan 16 '23 at 14:38
Note that lastpipe doesn't work in shells with job control enabled, so relying on it is perilous. – Charles Duffy Jan 16 '23 at 18:14
@CharlesDuffy can you provide more context? Do you mean `set +m` ? i did refer to the interactive shell which is explained in https://unix.stackexchange.com/questions/136206/readarray-or-pipe-issue – Summer-Sky Jan 18 '23 at 14:44
@EdMorton thank you. my documentation (bash 5.1.16(1) man page) does not say anything about leading/trailing blanks with `-r` ... but yes I did not care about backslash escape sequences in the content. but great point! – Summer-Sky Jan 18 '23 at 15:01
2

The `-r` is to stop escape sequences from being interpreted, it's the `IFS=` that stops leading/training spaces from being stripped. Similar to always quoting your variables by default, you need to always use both of those by default (e.g. when you "did not care about backslash escape sequences"), removing either or both only if/when necessary. – Ed Morton Jan 18 '23 at 15:03
1

@EdMorton thanks for the heads up. will keep that in mind – Summer-Sky Jan 18 '23 at 15:13
@EdMorton is my mentioned interpretation correct? > "My final understanding is that, <(...) opens a named pipe and readarray is waiting for it to close. By moving this in place of a file behind < it is handled by bash as a file input and (anonymously) piped into stdin." – Summer-Sky Jan 18 '23 at 15:46
1

Sounds right to me. – Ed Morton Jan 18 '23 at 15:50
BTW, stay clear of TLDP -- their bash documentation has a lot of outright incorrect information. The [BashGuide](https://mywiki.wooledge.org/BashGuide) and [BashFAQ](https://mywiki.wooledge.org/BashFAQ) were written _explicitly_ to have a less outdated/wrong/bad-practice-demonstrating reference available, after denizens of the #bash IRC channel got tired of answering questions about problems caused by people copying examples they'd seen in the ABS; the [bash-hackers' wiki](https://wiki.bash-hackers.org/) is also pretty good. – Charles Duffy Jan 18 '23 at 18:32

read csv output into an array and process the variable in a loop using bash

4 Answers4

readarray (mapfile) and read -a disambiguation

readarray == mapfile first:

While read -a: