3

My file looks like

"dog" 23 "a description of the dog" 123 456 "21"
"cat"  5 "a description of the cat" 987 654 "22"

I'm loading the file line by line into an array

filename=$1

while read -r line
do
   animal_array=($line)
  *do stuff
done < $filename

What I want to see:

animal_array[1] --> "dog"
animal_array[2] --> 23
animal_array[3] --> "a description of the dog"
animal_array[4] --> 123
animal_array[5] --> 456
aninal_array[6] --> "21"

What I get:

animal_array[1] --> "dog"
animal_array[2] --> 23
animal_array[3] --> "a  
animal_array[4] --> description
animal_array[5] --> of
animal_array[6] --> the
animal_array[7] --> dog"
animal_array[8] --> 123
animal_array[9] --> "21"

Struggling to find a way to do a check for "quotes" before I read the line into the array. The quotes need to be in the array.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Tony Radca
  • 31
  • 1
  • 1
    `bash` isn't really equipped for this kind of parsing; `read` can split a line, but it can't distinguish between quoted and unquoted whitespace to do so. You are better off using something like Python's `csv` module (or the equivalent in the language of your choice) instead. – chepner Jul 24 '18 at 14:38
  • Retaining the quotes is even more problematic; I would seriously reconsider your file format. – chepner Jul 24 '18 at 14:39
  • Unfortunately, bash is being expected as it is the company standard :( – Tony Radca Jul 24 '18 at 14:50
  • Does that standard prevent you from using `awk` or calling external programs? Must you use only native `bash`? – Nic3500 Jul 24 '18 at 15:45
  • I found a way around the retaining the quotes. I do it later in the program. – Tony Radca Jul 24 '18 at 17:36
  • I also discovered the following which allows me to break on white spaces and retain quoted strings ---> IFS='$\n' animal_array=( $(xargs -n1<<<"$line") ) The only issue I have now is if one of the records in the file has an empty string "" in which case everything gets moved up – Tony Radca Jul 24 '18 at 17:40
  • That version (like any other code depending on unquoted expansion) is buggy, as I describe in a comment on a duplicate discussing it. If you have a field containing only `*`, you'll end up with a bunch of filenames in your array. Any other data that *looks* like a glob -- and remember anything with square brackets, asterisks, `?`s, etc is a glob when expanded unquoted -- can similarly be impacted by runtime options such as `failglob`, `nullglob`, etc, even if there aren't any matching filenames. – Charles Duffy Jul 24 '18 at 17:42
  • Assuming you're targeting a new enough version of bash, make it `readarray -d '' animal_array < <(xargs printf '%s\0' <<<"$line")` to avoid the worst of those issues. – Charles Duffy Jul 24 '18 at 17:47
  • Even if you don't have a bash new enough for the `readarray -d` option, `readarray -t animal_array < <(xargs printf '%s\n' <<<"$line")` will at least fix your problem with `""`s, and should work properly with bash 4.0. (To work with bash 3.x, you'd want a `while read` loop like the one in my answer instead). – Charles Duffy Jul 24 '18 at 17:49

1 Answers1

0

If you don't mean to retain the quotes as data, use the answer at Bash: Reading quoted/escaped arguments correctly from a string instead.

That said, the GNU awk extension FPAT can be used for the kind of parsing you're requesting here, if you only need to handle double-quoted strings with literal data (no \" escaped quotes or other oddities within):

split_quoted_strings() {
  gawk '
    BEGIN {
      FPAT = "([^[:space:]\"]+)|(\"[^\"]+\")"
    }

    {
      printf("%d\0", NF)
      for (i = 1; i <= NF; i++) {
        printf("%s\0", $i)
      }
    }
  ' "$@"
}

# replace this with whatever you want to have called after a line has been read
handle_array() {
  echo "Read array with contents:"
  printf ' - %s\n' "$@"
  echo
}

while IFS= read -r -d '' num_fields; do
  array=( )
  for ((i=0; i<num_fields; i++)); do
    IFS= read -r -d '' piece
    array+=( "$piece" )
  done
  handle_array "${array[@]}"
done < <(split_quoted_strings)

...properly emits as output:

Read array with contents:
 - "dog"
 - 23
 - "a description of the dog"
 - 123
 - 456
 - "21"

Read array with contents:
 - "cat"
 - 5
 - "a description of the cat"
 - 987
 - 654
 - "22"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441