53

I have file a.txt with following content

    aaa
    bbb

When I execute following script:

while read line
do
    echo $line
done < a.txt > b.txt

generated b.txt contains following

aaa
bbb

It is seen that the leading spaces of lines have got removed. How can I preserve leading spaces?

Chris Stryczynski
  • 30,145
  • 48
  • 175
  • 286
Lahiru Chandima
  • 22,324
  • 22
  • 103
  • 179

2 Answers2

70

This is covered in the Bash FAQ entry on reading data line-by-line.

The read command modifies each line read; by default it removes all leading and trailing whitespace characters (spaces and tabs, or any whitespace characters present in IFS). If that is not desired, the IFS variable has to be cleared:

# Exact lines, no trimming
while IFS= read -r line; do
  printf '%s\n' "$line"
done < "$file"

As Charles Duffy correctly points out (and I'd missed by focusing on the IFS issue); if you want to see the spaces in your output you also need to quote the variable when you use it or the shell will, once again, drop the whitespace.

Notes about some of the other differences in that quoted snippet as compared to your original code.

The use of the -r argument to read is covered in a single sentence at the top of the previously linked page.

The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines). Without this option, any backslashes in the input will be discarded. You should almost always use the -r option with read.

As to using printf instead of echo there the behavior of echo is, somewhat unfortunately, not portably consistent across all environments and the differences can be awkward to deal with. printf on the other hand is consistent and can be used entirely robustly.

Etan Reisner
  • 77,877
  • 8
  • 106
  • 148
  • 8
    If you do not give `read` any arguments to use to hold the input (relying on the default variable `REPLY`), no whitespace is stripped and you can omit the modification to `IFS`. That is, `while read -r; do printf '%s\n' "$REPLY"; done < "$file"` – chepner Apr 17 '15 at 02:45
  • 2
    I'm not sure; it doesn't seem to be documented as far as I can tell. It makes some sense if you think of it as zero arguments require splitting the line into zero fields, meaning there is no use for `IFS`. (That assumes you accept that splitting a line into one field is still a split, albeit a degenerate one.) In any case, it is a `bash`ism; POSIX `read` requires at least one argument. – chepner Apr 17 '15 at 02:55
  • 4
    @chepner: `man bash` says about `$REPLY` (emphasis mine): "Set to the _line_ of input read by the read builtin command when no arguments are supplied." Thus, the idea is to read the _whole line as is_, as opposed to _splitting it into fields_. What is counter-intuitive, however, is that you still also have to specify `-r` to avoid backslash interpretation. Note that the (rarely used) `select` construct - where splitting into fields doesn't even enter the picture - also sets `$REPLY` to whatever the user entered (_invariably_ backslash-interpreted, but otherwise also as is). – mklement0 Apr 17 '15 at 03:58
  • 2
    As for why `-r` is still needed to suppress backslash interpretation even when using just `$REPLY` (not specifying any variable names): Not specifying `-r` potentially reads _multiple_ lines at once (joined without newlines), if the input has `\ `-escaped newlines; if using just `$REPLY` _implied_ `-r`, this multi-line behavior would not be available. – mklement0 Apr 17 '15 at 05:14
  • 1
    Very interesting discussion. What this seems to imply to me actually is the potential need for a distinction between `-r` and `-r`-for-newlines-only. – Etan Reisner Apr 17 '15 at 11:39
  • 2
    `read`'s default behavior allows _line continuation_ by ending a line with `\ ` and continuing it on the next line (both the `\ ` and the newline are _discarded_). This is rarely useful, except maybe for _interactive_ input, and, given that the `\ ` in _any_ `\` pair is discarded, typically surprises the user. I think the real issue here is that the behavior of `-r` (_always_ read _only one_ line, keep all backslashes) should have been the _default_ behavior all along, but the current behavior is POSIX-mandated, so we're stuck with it. (cont'd in next comment) – mklement0 Apr 17 '15 at 18:15
  • 2
    (cont'd from prev. comment) The upshot is: as your quote states, you'll almost always want to use `-r`, and I don't see a real need to support line continuation with `-r`. While not quite the same, if you do have the need to read across lines in `bash`, `ksh`, and `zsh`, you can use `read -r d …`, which does have the advantage that the newlines don't have to (and shouldn't be) `\ `-escaped. – mklement0 Apr 17 '15 at 18:15
  • Maybe`while IFS= read -r line || [ -n "$line" ];` is the final solution https://stackoverflow.com/a/31398490/456536 – Míng Dec 02 '22 at 08:05
  • Re: "You should almost always use the -r option with read.": then why `-r` is not enabled by default? – pmor May 26 '23 at 12:11
19

There are several problems here:

  • Unless IFS is cleared, read strips leading and trailing whitespace.
  • echo $line string-splits and glob-expands the contents of $line, breaking it up into individual words, and passing those words as individual arguments to the echo command. Thus, even with IFS cleared at read time, echo $line would still discard leading and trailing whitespace, and change runs of whitespace between words into a single space character each. Additionally, a line containing only the character * would be expanded to contain a list of filenames.
  • echo "$line" is a significant improvement, but still won't correctly handle values such as -n, which it treats as an echo argument itself. printf '%s\n' "$line" would fix this fully.
  • read without -r treats backslashes as continuation characters rather than literal content, such that they won't be included in the values produced unless doubled-up to escape themselves.

Thus:

while IFS= read -r line; do
  printf '%s\n' "$line"
done
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 1
    Good advice, but the two-character sequence `\n` does _not_ result in a _newline_, it results in _literal `n`_. By contrast, a `\ `-escaped _actual_ newline causes `read` to read the _following_ line also, and to directly append it to the current one (discarding the `\ ` and the newline). A `\ ` before any other character is simply discarded. – mklement0 Apr 17 '15 at 04:47
  • 3
    Another way of describing the behavior of `read` without `-r`: the input is parsed in the same way a bareword with individually `\ `-escaped characters is parsed by the (POSIX) shell itself (e.g., as part of an argument list), as described at http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 and essentially duplicated in `read`'s POSIX spec at http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html. – mklement0 Apr 17 '15 at 18:34
  • 2
    Thank you -- I'm going to want to review the source material to determine how best to revise that part of my answer. – Charles Duffy Apr 17 '15 at 19:08