27

Believe it or not, I can't find the answer to what I would think would be this very basic question.

In awk, how can I loop over the input string character by character? Let's say I just wanted to print them out. Is there an array I can access? Or do I need to use substr?

Basically, something like:

echo "here is a string" | awk '
{ for(i=0; i<[length of input string]; i++) 
    printf [value at index i in array x]; 
}'

Frankly, I'm embarrassed.

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
jasonmclose
  • 1,667
  • 4
  • 22
  • 38

4 Answers4

49

You can convert a string to an array using split:

echo "here is a string" | awk '
{ 
  split($0, chars, "")
  for (i=1; i <= length($0); i++) {
    printf("%s\n", chars[i])
  }
}'

This prints the characters vertically, one per line.

Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88
  • 1
    actually, length() is a gawk extension AFAIK, it doesn't work on pure awk http://stackoverflow.com/questions/14720898/illegal-reference-to-an-array-in-awk-i-am-having-trouble-figuring-out-awk –  Apr 02 '14 at 10:03
  • 1
    @vaxquis I'm not sure what you mean by "pure" awk, but `length` is in POSIX. The gawk extension is in applying to arrays instead of strings. Fortunately, we can just switch `length(chars)` to `length($0)`. – Michael J. Barber Apr 02 '14 at 10:31
  • 2
    "pure" awk in the sense of "not any extended awk"... and yes, I meant this usage of length(); also, you can use "len = split(...)" and later "i<=len" with the same result. –  Apr 02 '14 at 10:34
  • one more question - this obviously ignores whitespaces in the input - is there any way to split the data so that I actually *know* where the whitespaces were? or do I have to parse each 'record' (line) separately to do that? –  Apr 02 '14 at 10:47
24

By default in awk the Field Separator (FS) is space or tabs. Since you mentioned you wanted to loop over each character and not word, we will have to redefine the FS to nothing. Something like this -

[jaypal:~/Temp] echo "here is a string" | awk -v FS="" '
{for (i=1;i<=NF;i++) printf "Character "i": " $i"\n"}' 
Character 1: h
Character 2: e
Character 3: r
Character 4: e
Character 5:  
Character 6: i
Character 7: s
Character 8:  
Character 9: a
Character 10:  
Character 11: s
Character 12: t
Character 13: r
Character 14: i
Character 15: n
Character 16: g
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 1
    hm. actually, it works when FS is set in the code, but in a bit different way... (eg. first line is not parsed) Any ideas why? –  Apr 02 '14 at 09:54
  • 3
    It's because the first line is already read before with default FS. – jaypal singh Apr 02 '14 at 11:46
  • 3
    @vaxquis You have to do it in BEGIN: `'BEGIN {FS="";}'` – alephreish Apr 14 '14 at 14:36
  • 1
    @vaxquis Setting the `FS` outside the code or in `BEGIN` is one and the same thing. `BEGIN` block is read only once before the first line of input is read. – jaypal singh Apr 14 '14 at 15:44
  • Note that the behavior of an empty `FS` in many implementations of awk is undefined: https://stackoverflow.com/questions/22044272/gawk-fs-to-split-record-into-individual-characters – Soren Bjornstad Jan 15 '19 at 12:17
8

Not all awk implementations support the above solutions. In that case you could use substr:

echo here is a string | awk '{
  for (i=0; ++i <= length($0);) 
    printf "%s\n", substr($0, i, 1)
  }'

P.S. In some awk implementations length without arguments defaults to $0, i.e. length and length($0) are equivalent.

Dimitre Radoulov
  • 27,252
  • 4
  • 40
  • 48
2

if you have gawk:

awk '$0=gensub(/(.)/,"\\1\n","g")' file

test:

kent$  echo "I am a String"|awk '$0=gensub(/(.)/,"\\1\n","g")'
I

a
m

a

S
t
r
i
n
g
Kent
  • 189,393
  • 32
  • 233
  • 301