1

I was surprised to find that the following cut command:

for n in {1..10}; do echo "[$(echo ' a    b c   de f ' | cut -d' ' -f$n)]"; done

returns:

[]
[a]
[]
[]
[]
[b]
[c]
[]
[]
[de]

While I could probably rig up an awk to get the desired (non-delimiters only) approach - is there a way to use cut itself in a little more intelligent manner?

I am looking for cut to output:

[a]
[b]
[c]
[de]
[f]

Update. I am getting answers providing alternate ways (not using cut) to do this. That is not the aim of this post. E.g. another way using awk is:

 echo "[$(echo ' a    b c   de f ' | awk -F' ' -f3)]"

 [c]
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560

4 Answers4

1

cut is an excellent tool for jobs where the delimiter is a single unchanging character. The parsing of files like /etc/passwd and /etc/group are in this category. Consider these lines from /etc/passwd:

sshd:x:103:65534::/var/run/sshd:/usr/sbin/nologin
messagebus:x:104:106::/var/run/dbus:/bin/false

Note that (1) The separator in these files is always colon, :, and never varies, and (2) two colons together mean that there is an empty field. This is what cut was designed for.

By default, the separator that cut uses is a tab. One can optionally change the separator to be a space. But, there is no way to tell cut that the separator can be either a tab or a space. There is also no way to tell cut to treat repeated separators as one. Repeated separators are always interpreted as meaning empty fields.

When the separators don't fit the above requirements, cut is the wrong tool.

When field separators require more flexibility, awk or shell should be considered. By default, awk accepts any sequence of whitespace as a field separator. This can be customized, even to the point of having a regex for the field separator, by changing the FS variable. The default for shell is also any sequence of any whitespace and this can be changed to other characters, but not regexes, using the IFS variable.

As an example, here is an awk solution:

$ echo ' a    b c   de f ' | awk '{for (i=1;i<=NF;i++) print "["$i"]"}'
[a]
[b]
[c]
[de]
[f]

Making shell and awk work together

To transfer a shell variable to awk, it is simplest to use a -v variable assignment. For example, the following uses -v to assign the value of the n shell to an awk variable named m:

$ for n in {1..5}; do echo ' a    b c   de f ' | awk -v m=$n '{printf "[%s]\n", $m}'; done
[a]
[b]
[c]
[de]
[f]

Note that the awk code is all in single-quotes. This means that the shell does not mess with it. In the awk code, $m refers to the value of field number m. $m has nothing to do with any shell variable or shell substitution.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • i could not figure out the quotations nesting for doing that in awk. OOC I had been of the understanding that double quotes `"` nested inside of single quotes `'` ( but not the other way around) would be taken as literals and thus the $i variable would not be properly evaluated – WestCoastProjects Jul 29 '15 at 02:11
  • @javadba Yes, that is correct: we do not want the shell to evaluate `$i`. We want the shell to treat `$i` as a literal. It is awk, __not__ the shell, that interprets the meaning of `$i`. To awk, `$i` is the contents of field number `i` and `i` is an awk variable (a number). Also, I added another example to the answer that shows how to pass shell variables into awk without complex quoting. – John1024 Jul 29 '15 at 03:39
  • ah ok . explains a long standing 'quarrel' i have had with awk. – WestCoastProjects Jul 29 '15 at 05:27
  • you seem to know a something or other about this awk thing. – WestCoastProjects Jul 29 '15 at 05:28
0

Well, cut takes into account empty fields (and this is logical). If you have a string "a~bb~~c" (~ is a space), the 1st is "a", the 2nd is "bb", the 3rd is "" and the 4th is "c".

You might want to use tr beforehand as shown here.

for n in {1..10}; do echo "[$(echo ' a    b c   de f ' | tr -s ' ' | cut -d' ' -f$n)]"; done
Community
  • 1
  • 1
styko
  • 641
  • 3
  • 14
  • So the tr -s ' ' is interesting : it *does* allide the multiple instances of the delimiter. But the array still has empty entries at the first and the last positions. – WestCoastProjects Jul 28 '15 at 23:01
0

Not sure why your using cut in a for loop, but you can get the desired output in bash with just:

$ for i in ' a    b c   de f '; do printf "[%s]\n" $i  ; done
[a]
[b]
[c]
[de]
[f]
Nathan Wilson
  • 856
  • 5
  • 12
0

Is what you expect (bash shell):

$ ar=(a b c de e)
$ for i in ${ar[@]}; do echo "[$i]"; done
[a]
[b]
[c]
[de]
[e]

Or :

for i in {a,b,c,de,f}; do echo "[$i]"; done
[a]
[b]
[c]
[de]
[f]

Using cut here feels not natural

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223