3

I am trying to read a csv file into a bash associative array but am not getting the results I expect.

Using Bash 5.0.18

Bellum:fox3-api rocky$ bash --version
GNU bash, version 5.0.18(1)-release (x86_64-apple-darwin19.5.0)

Contents of foobar.csv

Bellum:scripts rocky$ cat ./foobar.csv
foo-1,bar-1
foo-2,bar-2
foo-3,bar-3

Contents of problem.sh

#!/usr/bin/env bash

declare -A descriptions
while IFS=, read name title; do
      echo "I got:$name|$title"
      descriptions[$name]=$title
done < foobar.csv

echo ${descriptions["foo-1"]}
echo ${descriptions["foo-2"]}
echo ${descriptions["foo-3"]}

Actual Output from problem.sh

Bellum:scripts rocky$ ./problem.sh
I got:foo-1|bar-1
I got:foo-2|bar-2

bar-2

Bellum:scripts rocky$

Desired output:

I got:foo-1|bar-1
I got:foo-2|bar-2
I got:foo-3|bar-3    
bar-1
bar-2
bar-3

Comment Requested Outputs

    Bellum:scripts rocky$ head -n 1 ./foobar.csv | hexdump -C
    00000000  ef bb bf 66 6f 6f 2d 31  2c 62 61 72 2d 31 0d 0a  |...foo-1,bar-1..|
    00000010
    Bellum:scripts rocky$ od -c foobar.csv
    0000000  357 273 277   f   o   o   -   1   ,   b   a   r   -   1  \r  \n
    0000020    f   o   o   -   2   ,   b   a   r   -   2  \r  \n   f   o   o
    0000040    -   3   ,   b   a   r   -   3
    0000050

Cyrus's dos2unix change

    #!/usr/bin/env bash
    
    declare -A descriptions
    dos2unix < foobar.csv | while IFS=, read name title; do
          echo "I got:$name|$title"
          descriptions[$name]=$title
    done
    
    echo ${descriptions["foo-1"]}
    echo ${descriptions["foo-2"]}
    echo ${descriptions["foo-3"]}

Output of Cyrus's dos2unix change

    Bellum:scripts rocky$ ./problem.sh
    I got:foo-1|bar-1
    I got:foo-2|bar-2
    
    
    
    
    Bellum:scripts rocky$

The csv file is made on a Mac by saving as csv from Microsoft Excel. Thanks in advance for any insights.

Hybrid Solution

For future people, this problem was actually two issues. The first was from saving my CSV file from a Microsoft Excel for Mac workbook. I Saved As... "CSV UTF-8" format (the first CSV file format listed in the drop down menu of Excel). This adds in additional bytes that messed up the read command in bash. Interestingly, these bytes won't show up in a cat command (see original post problem description). Saving the CSV instead from Excel as "Comma Separated Values" (much further down the drop down list of formats), got rid of this first problem.

Secondly, @Léa Gris and @glenn jackman pointed me in the right direction for modifiers to my script that helped with some newline and carriage return characters that were present in the Excel saved file.

Thanks, everyone. I spent a full day trying to figure this out. Lesson learned: I should have turned to Stackoverflow much sooner.

dmjones
  • 105
  • 1
  • 10
  • 1
    your code works for me; I'd be curious as to what exactly is in the array => after the `while` loop add `typeset -p descriptions` to get a look at the complete array definition; may also want to verify the contents of the data file => `od -c foobar.csv`, then review the output for any non-printing characters other than `\n` – markp-fuso Oct 24 '20 at 21:46
  • Add output of `head -n 1 ./foobar.csv | hexdump -C` to your question (no comment). – Cyrus Oct 24 '20 at 21:52
  • Check the CSV file and make sure it doesn't have CR characters. Use `dos2unix foobar.csv` to fix it if it does. – Barmar Oct 24 '20 at 21:53
  • I added markp-fuso's and Cyrus's requested output and a description of how the csv is created. – dmjones Oct 24 '20 at 22:04
  • from the `od -c` output it looks like this is more than just an ascii text file ... a) not sure what the `357 273 277` is supposed to be (ie, shouldn't be there in an ascii text file), b) first 2 lines have DOS endings (`\r \n`) (dos2unix should normally get rid of those) and 3) the 3rd line is missing the EOL characters (ie, no `\r \n`); I'm not familiar with mac/excel so not sure what to do from here (short of surgery on the data file) if there's no way to save the data as an ascii file – markp-fuso Oct 24 '20 at 22:41
  • 1
    fwiw, found a few hits for the `357 273 277` - appears to be a 'utf-8 byte order mark'; couple ideas for removing (if can't eliminate during the save of the file from excel): [this](https://unix.stackexchange.com/q/304177) and [this](https://stackoverflow.com/q/45240387/) – markp-fuso Oct 24 '20 at 22:53
  • 1
    See; [How can I remove the BOM from a UTF-8 file?](https://stackoverflow.com/questions/45240387) – Léa Gris Oct 25 '20 at 02:16

3 Answers3

3

Here's why you don't get the output you expect:

    Bellum:scripts rocky$ od -c foobar.csv
    0000000  357 273 277   f   o   o   -   1   ,   b   a   r   -   1  \r  \n
    0000020    f   o   o   -   2   ,   b   a   r   -   2  \r  \n   f   o   o
    0000040    -   3   ,   b   a   r   -   3
    0000050
  1. the name on first line does not contain just "foo-1" -- there are extra characters in there.
    • They can be removed with "${name#$'\357\273\277'}"
  2. the last line does not end with a newline, so the while-read loop only iterates twice.
    • read returns non-zero if it can't read a whole line, even if it reads some characters.
    • since read returns "false", the while loop ends.
    • this can be worked around by using:
      while IFS=, read -r name title || [[ -n $title ]]; do ... 
      #............................. ^^^^^^^^^^^^^^^^^^ 
      
    • or, just fix the file.

Result:

BOM=$'\357\273\277'
CR=$'\r'

declare -A descriptions
while IFS=, read name title || [[ $title ]]; do
  descriptions["${name#$BOM}"]=${title%$CR}
done < foobar.csv

declare -p descriptions
echo "${descriptions["foo-1"]}"
echo "${descriptions["foo-2"]}"
echo "${descriptions["foo-3"]}"
declare -A descriptions=([foo-1]="bar-1" [foo-2]="bar-2" [foo-3]="bar-3" )
bar-1
bar-2
bar-3
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Thanks for your input. Your comment regarding the extra characters before "foo-1" led me to further investigate how I was creating the file and to an eventual solution (see my edit at bottom of the original post). Your comments on how to edit my read command, along with @Léa Gris, fixed my problem. Thanks for taking the time to comment. – dmjones Oct 25 '20 at 02:03
3

This will work with your input file, regardless of either Unix or DOS newlines, regardless of an UTF-8 BOM marker, and regardless if the last line has a newline marker or not before end of file:

#!/usr/bin/env bash

declare -A descriptions
# IFS=$',\r\n' allow to capture either Unix or DOS Newlines
# read -r warrant not to expand \ escaped special characters
# || [ "$name" ] will make sure to capture last line
# even if it does not end with a newline marker
while IFS=$',\r\n' read -r name title || [ "$name" ]; do
      echo "I got:$name|$title"
      descriptions[$name]=$title
done < <(
  # Filter-out UTF-8 BOM if any
  sed $'1s/^\357\353\277//' foobar.csv
)

echo "${descriptions["foo-1"]}"
echo "${descriptions["foo-2"]}"
echo "${descriptions["foo-3"]}"

# A shorter option for debug, is to dump the variable as a declaration
typeset -p descriptions

Now a very compact way to transfer your CSV into the associative array all at once

#!/usr/bin/env bash

# shellcheck disable=SC2155 # Safe generated assignment with printf %q
declare -A descriptions="($(
  # Collect all values from file into an array
  IFS=$'\r\n,' read -r -d '' -a elements < <(
    # Discard the UTF-8 BOM from the input file if any
    sed $'1s/^\357\353\277//' foobar.csv
  )
  # Format the elements into an Associative array declaration [key]=value 
  printf '[%q]=%q ' "${elements[@]}"
))"

echo "${descriptions["foo-1"]}"
echo "${descriptions["foo-2"]}"
echo "${descriptions["foo-3"]}"

# A shorter option for debug, is to dump the variable as a declaration
typeset -p descriptions
Léa Gris
  • 17,497
  • 4
  • 32
  • 41
  • This was very helpful. It got half of the problem solved and I appreciate showing the full working script. The only thing this didn't solve was the problem with the input file from Excel (see the edit at bottom of original post). Thank you so much! – dmjones Oct 25 '20 at 02:00
  • @dmjones I added the automatic deletion of the BOM if any, so you don't have to worry about it being created: – Léa Gris Oct 25 '20 at 02:28
  • You are awesome. – dmjones Oct 25 '20 at 02:30
1

The issue is with the first 3 bytes, you can remove them with :

dd bs=1 skip=3 if=foobar.csv of=foobar2.csv

and try with foobar2.csv

Philippe
  • 20,025
  • 2
  • 23
  • 32
  • Your comment is true regarding the first three bytes. I later determined what was causing it. Thanks for the help. – dmjones Oct 25 '20 at 02:04