1

I have an issue whenever I run this script. It gets the right word count of the file, but whenever I run it in the terminal it has unwanted spacing.

#!/bin/bash
char=$(cat $1 | wc -c)
echo "This file has $char characters in it."
nolines=$(cat $1 | tr -d "\n" | wc -c)
echo "This file has $nolines characters not counting the new line."
emptyline=$(grep -cv '\S' $1) echo "This file has $emptyline empty lines."
alphachar=$(tr -cd '[:alpha:]' < $1 | wc -c)
echo "This file has $alphachar alphanumeric characters."

Using a file with this in it called example_file (this is the file under this, or the content of the file):

This is the first line
This is the second
This has the symbols @#$

there was just an empty line.

So whenever I run my script like ~/script.sh example_file it gives an output of

This file has        93 characters in it.
This file has        88 characters not counting the new line.
This file has 1 empty lines.
This file has        70 alphanumeric characters.

I was expecting for the output to have no spacing in between.

rici
  • 234,347
  • 28
  • 237
  • 341
justin8p
  • 11
  • 1
  • 2
    Note that `grep` is not guaranteed to understand that `\S` means `[^[:space:]]` -- POSIX only requires it to understand BRE and ERE regex syntax forms, certainly not PCRE. – Charles Duffy Nov 10 '22 at 04:10
  • I might have gotten the code block for `example_file` wrong, if there's supposed to be an empty line in it, please [edit] the question. – Benjamin W. Nov 10 '22 at 04:11
  • 2
    Also, `cat $1 |` is a slower and buggier way of writing `<"$1"` (slower because it starts a separate `/bin/cat` executable and forces the thing on the right-hand side to read from a FIFO instead of direct from the input file; buggier because it can't handle filenames with spaces). Consider making a habit of doing the latter. – Charles Duffy Nov 10 '22 at 04:11
  • Look at the output of `wc -c`. Does it have extra spaces? If so, it's reasonable to expect your variables to themselves have those spaces. – Charles Duffy Nov 10 '22 at 04:12
  • (btw, are you sure you're really on Linux? I see extra spaces in the output from BSD `wc`, but not from a GNU coreutils one; the latter is way more popular on Linux). – Charles Duffy Nov 10 '22 at 04:15
  • What do you mean by _whenever I run it in the terminal_. Run **what** exactly? – user1934428 Nov 10 '22 at 08:39

1 Answers1

0

Yes, wc is (sometimes) allowed to write spaces before the numeric result.

Consider the following behavior, seen on MacOS:

$ /usr/bin/wc -c <<<"hello" | xxd
00000000: 2020 2020 2020 2036 0a                          6.

Those 20s are all spaces (the 0a is the trailing newline, the 36 is the number 6); they're there because the POSIX standard for wc specifies that -c suppresses printing numbers except for the requested (byte-count) one; but it doesn't suppress printing spaces.

The behavior of this version of wc arguably violates the current version of the POSIX standard, which specifies that leading spaces should be suppressed when no filename is passed to wc as an argument; current GNU coreutils wc does not behave as the question describes.

Regardless, though, to turn it off you can just suppress whitespace using a parameter expansion such as ${char//[[:space:]]/}, or an arithmetic context, $((char)):

char=$(wc -c <"$1")
echo "This file has $((char)) characters in it."
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 1
    Maybe also mention the benefits of avoiding the [useless `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat) – tripleee Nov 10 '22 at 06:33
  • 1
    Maybe also explain the benefits of `printf` rather than `echo`: `printf 'This file has %d characters in it.\n' "$(wc -c <"$1")"`. Or an arithmetic casting: `char=$(("$(wc -c <"$1")"+0))` – Léa Gris Nov 10 '22 at 06:59
  • @LéaGris: or an integer declaration: `declare -i char=$(cat $1 | wc -c)`. – rici Nov 10 '22 at 08:35
  • 1
    @LéaGris, an arithmetic context is a great idea, thank you. Where is the `+0` necessary? – Charles Duffy Nov 10 '22 at 12:52
  • @CharlesDuffy the `+0` is not necessary. My brain farted in a rush probably. – Léa Gris Nov 10 '22 at 17:13