1

I am new in bash scripting, explain me please how delimetr IFS works? I have a file:

 a,b,c,d
1,2,"one,two",

I read it:

while IFS=, read a b c d
do
echo $d
done < $file

The result :

d
two"

But what about comma in the end? I expect

d
two",

If I read next file:

a,b,c,d
1,2,"one,two",,

I take :

d
two",,

Please explain me that is the difference and how IFS works!!!

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
QwertyBot
  • 51
  • 6
  • 1
    Why expect a comma at the end? It is removed because of the value of `$IFS`. – Jonathan Leffler Jul 07 '21 at 21:04
  • BTW, always `echo "$d"`, not `echo $d`. See [I just assigned a variable, but `echo $variable` prints something different!](https://stackoverflow.com/questions/29378566/i-just-assigned-a-variable-but-echo-variable-shows-something-else) – Charles Duffy Jul 07 '21 at 21:09
  • 2
    Anyhow, "why do characters in IFS not always get stripped from the tail of a variable populated by read?" is a reasonable question, but if that's the question you want to ask it you should ask it _explicitly_ instead of making us guess if it's what you really want to know. (Short answer: `read` only performs as many splits as needed for the number of variables it's asked to populate; if you made it `read a b c d _`, _then_ you'd have the trailing `,`s gone -- example @ https://ideone.com/eg1iCr). – Charles Duffy Jul 07 '21 at 21:10

3 Answers3

2

Despite IFS (presumably) standing for "internal(?) field separator", it's actually used as a field terminator. From the man page,

Word Splitting

  The  shell  scans the results of parameter expansion, command substitu-
  tion, and arithmetic expansion that did not occur within double  quotes
  for word splitting.

  The  shell  treats each character of IFS as a delimiter, and splits the
  results of the other expansions into words using  these  characters  as
  field   terminators. [...]

The number of words that read splits the input into depends on how many variables it needs to populate. Further, if there are more possible fields than variables, trailing terminators are preserved.

Using 1,2,"one,two", as the example, we get the following fields for n variables:

  1. 1,2,"one,two",. All terminators are preserved.
  2. 1 and 2,"one,two",
  3. 1, 2, "one,two",
  4. 1, 2, "one, and two"
  5. 1, 2, "one, two".

In the case of 1,2,"one,two",, and 4 variables, there is an "empty" field between the two final commas. You need a fifth variable to consume that in order to discard the trailing comma.

Quotes are ignored; there is no way to "escape" a field terminator to allow it to be treated literally.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • What about the case of `1,2,"one,two",,`, where neither of the trailing commas are discarded? – that other guy Jul 07 '21 at 21:17
  • In that case, there are again more possible fields than variables to set. With four variables, the split produces `1`, `2`, `"one"`, and `two",,`. – chepner Jul 07 '21 at 21:19
  • 1
    *there is no way to "escape" a field terminator to allow it to be treated literally* -- backslash escapes the field separator and the termination character unless the `-r` option to `read` is used in bash. `IFS=: read a _ <<<'x\:y' && echo "$a"` prints `x:y`. – Grisha Levit Jul 08 '21 at 00:23
  • @GrishaLevit Hm, good to know. I never thought too hard about it; I assumed that only applied to "known" escapes like `\t` for tab. – chepner Jul 08 '21 at 01:14
1

The Bash manual page for the read command says:

One line is read from the standard input, …, split into words as described above in Word Splitting, and the first word is assigned to the first name, the second word to the second name, and so on. If there are more words than names, the remaining words and their intervening delimiters are assigned to the last name. If there are fewer words read from the input stream than names, the remaining names are assigned empty values. The characters in the value of the IFS variable are used to split the line into words using the same rules the shell uses for expansion (described above in Word Splitting).

What you are seeing matches what the manual says will happen.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • How are you interpreting this to mean that empty fields are ignored only if they're the last in the sequence? – that other guy Jul 07 '21 at 21:29
  • See the section on [Word Splitting](https://www.gnu.org/software/bash/manual/bash.html#Word-Splitting) too. In the first example, there is no word after the final delimiting comma and the comma is eliminated; in the second, there are characters for extra fields, so the delimiters are retained. – Jonathan Leffler Jul 07 '21 at 21:34
  • `a,,b` is three fields, right? Two single-letter fields and an empty field. Why does shuffling their order into `a,b,` reduce the total number of fields? – that other guy Jul 07 '21 at 21:41
  • Because the commas are delimiters, not separators. – Jonathan Leffler Jul 07 '21 at 21:41
  • I don't think that is a canonical interpretation of the word. Generally, a delimiter and separator is the same, and distinct from a terminator – that other guy Jul 07 '21 at 21:46
  • `a,,b` is three fields *if* `read` has been requested to fill 3 variables. With 2 variables, the fields are `a` and `,b`. Only the first comma terminates a field; the other comma is part of the second field. – chepner Jul 07 '21 at 21:49
  • @thatotherguy: OK — then we are going to have to agree to differ on the interpretation of 'separator' and 'delimiter'. In the areas where I work, the terms are different, and 'delimiter' is pretty much the same as 'terminator' — and 'separator' comes between (separates) two fields. – Jonathan Leffler Jul 07 '21 at 21:59
  • POSIX and Bash do both explicitly say "The shell shall treat each character of the IFS as a delimiter and use the delimiters as field **terminators**", so lgtm – that other guy Jul 07 '21 at 22:12
0

Note that bash 5 comes with a loadable CSV module that handles all this mess much more nicely

bash_root=${BASH%/bin/bash}
[[ -d "$bash_root/lib/bash" ]] || exit
BASH_LOADABLES_PATH="$bash_root/lib/bash"
enable -f csv csv

while IFS= read -r line; do
    csv -a fields "$line"
    declare -p fields
done < file

When "file" contains

 a,b,c,d
1,2,"one,two",
1,2,"one,two",,

that script outputs

declare -a fields=([0]=" a" [1]="b" [2]="c" [3]="d")
declare -a fields=([0]="1" [1]="2" [2]="one,two" [3]="")
declare -a fields=([0]="1" [1]="2" [2]="one,two" [3]="" [4]="")

There's currently no option for the csv command to use anything other than comma as the field separator.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352