17

I am dealing with a file where fields are separated by a single space.

awk interprets the FS " " as "one or more whitespace", which misreads my file when one of the fields is empty.

I tried using "a space not followed by a space"( " (?! )" ) as FS but awk does not support negative lookahead. Simple google queries like "single space field separator awk" only sent me to the manual page explaining the special treatment of FS=" ". I must have missed the relevant manual page...

How can I use a single space as field separator with awk?

asachet
  • 6,620
  • 2
  • 30
  • 74
  • I had actually opened the relevant manual page... https://www.gnu.org/software/gawk/manual/html_node/Regexp-Field-Splitting.html – asachet Mar 15 '16 at 10:17
  • `FS can be set to "[ ]" to use a single space as field separator.` – asachet Mar 15 '16 at 10:18
  • `awk -F'[ ]' '{printf ">%s<",$2}' <<< 'a b'` doesn't work for me, I'm using gawk – hek2mgl Mar 15 '16 at 12:42
  • Uh, looks like the commend system slurped the newlines. I actually use `<<< 'a[space][space][space][space]b'` as input – hek2mgl Mar 15 '16 at 12:44
  • I meant "slurped the whitespaces". – hek2mgl Mar 15 '16 at 12:50
  • @hek2mgl I get the output I expect, i.e. `><`. If that is not what you expect, your *expectations* are wrong. – tripleee Mar 15 '16 at 13:12
  • @antoine-sac Can you post a few lines of your content and what you expect the fields to be? Please pay close attention on the empty field, whether it's a null string or additional white space. – karakfa Mar 15 '16 at 13:20
  • @tripleee Please read the question carefully. OP reports that in his data there are fields which containing whitespace only. – hek2mgl Mar 15 '16 at 13:48
  • @hek2mgl I have reread the question twice and if anything, I think your reading is incorrect. The OP specifically asks about an empty field. I see nothing about fields containing whitespace. – tripleee Mar 15 '16 at 13:52
  • @tripleee Yeah, looks like you are right. My fault. – hek2mgl Mar 15 '16 at 14:05
  • @tripleee @hek2mgl As triplee says, I never have whitespace in my fields, but some fields are empty. Setting FS to `"[ ]"` worked for me and I am using gawk! Thanks for your help. – asachet Mar 15 '16 at 14:49

2 Answers2

31

this should work

$ echo 'a    b' | awk -F'[ ]' '{print NF}'
5

where as, this treats all contiguous white space as one.

$ echo 'a    b' | awk -F' ' '{print NF}'
2

based on the comment, it need special consideration, empty string or white space as field value are very different things probably not a good match for a white space separated content.

I would suggest preprocessing with cut and changing the delimiters, for example

$ echo 'a    b' | cut -d' ' -f1,3,5 --output-delimiter=,
a,,b
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • It sounds like OP wants `3` instead of `5` because the two spaces enclosed by the delimiter are a field - a field containing two spaces. – hek2mgl Mar 15 '16 at 12:48
  • @hek2mgl Uh, no, there is no way for a field to contain the field separator. When the field separator is a single space, two adjacent spaces are separators around an empty field. – tripleee Mar 15 '16 at 13:13
  • @tripleee Yeah. I still think the question is interesting, I mean it is generally a valid use case, however I would have chosen a different delimiter in that case. – hek2mgl Mar 15 '16 at 13:49
  • @karafka Nice idea to change the delimiters, but I don't have any whitespace in my fields so the first solution was sufficient. Thanks! – asachet Mar 15 '16 at 14:52
0

To give a couple of helpful manpage references for this behaviour:

Default Field Splitting explains that " " is the default value, but carries a special meaning:

The default value of the field separator FS is a string containing a single space, " ".

If awk interpreted this value in the usual way, each space character would separate fields, so two spaces in a row would make an empty field between them.

The reason this does not happen is that a single space as the value of FS is a special case—it is taken to specify the default manner of delimiting fields.

Regexp Field Splitting explains how delimit a single space:

For a less trivial example of a regular expression, try using single spaces to separate fields the way single commas are used. FS can be set to "[ ]" (left bracket, space, right bracket).

This regular expression matches a single space and nothing else (see Regular Expressions).

(Added the emphasis and paragraphing.)

mwfearnley
  • 3,303
  • 2
  • 34
  • 35