The question is unclear. If it is a rewrite of a previous question, please make it self-contained.
To “parse” a CSV file in Bash while turning its headers into array variable names and storing each column as an array, one can use “pointers” in Bash, i.e. the declare -n
construct.
The entire approach has major drawbacks though:
- A function modifies global variable state instead of merely providing output and leaving it up tho the caller to (not) process it. This is hard to debug and bad for code isolation and reusability.
- Changing a script’s variables based on input data is not a great idea. If you read a CSV column called
PATH
, for example, it becomes obvious why this is not the most secure way of data processing.
- Unfortunately, Bash doesn’t have a (straightforward) equivalent of a
struct
(other than declare -A
) such that it could be nested in another array. There goes the dream of per-row CSV data representation. Storing each column in a separate array makes many inconsistent states representable, such as column arrays with different sizes.
With all that↑ said, let’s read the CSV without unnecessary subshells, redirections, expensive external processes (head
, tail
) and without trying to assign variables “behind a pipe”, which won’t have any effect in the local (fork()
parent) shell:
#!/bin/bash
parse_csv() {
local -a headers line
local header
local -i index
IFS=, read -ra headers
for header in "${headers[@]}"; do
declare -ag "${header}=()"
done
while IFS=, read -ra line; do
for index in "${!line[@]}"; do
local -n column="${headers[index]}"
column+=("${line[index]}")
done
done
}
And of course we should test it using a bit of (valid) data. As noted above, handling of invalid data (i.e. correct error reporting) would be a challenge with this setup, so the whole example assumes (and contains) valid inputs only:
#!/bin/bash
set -euo pipefail
parse_csv <<- BLAH
a,b,c,d,e
1,2,3,4,5
f,g,h,i,j
blah,foo,meh,bar,oops
BLAH
for column in {a..e}; do
declare -n array="$column"
printf '%s\n' "${array[*]@A}"
done
Here’s the output from the above, showing that each column is now (indeed) a Bash array:
declare -a a=([0]="1" [1]="f" [2]="blah")
declare -a b=([0]="2" [1]="g" [2]="foo")
declare -a c=([0]="3" [1]="h" [2]="meh")
declare -a d=([0]="4" [1]="i" [2]="bar")
declare -a e=([0]="5" [1]="j" [2]="oops")