loop over characters in input string using awk

Question

Believe it or not, I can't find the answer to what I would think would be this very basic question.

In awk, how can I loop over the input string character by character? Let's say I just wanted to print them out. Is there an array I can access? Or do I need to use substr?

Basically, something like:

echo "here is a string" | awk '
{ for(i=0; i<[length of input string]; i++) 
    printf [value at index i in array x]; 
}'

Frankly, I'm embarrassed.

Michael J. Barber · Accepted Answer · 2014-04-02T10:31:33.770

49

You can convert a string to an array using split:

echo "here is a string" | awk '
{ 
  split($0, chars, "")
  for (i=1; i <= length($0); i++) {
    printf("%s\n", chars[i])
  }
}'

This prints the characters vertically, one per line.

edited Apr 02 '14 at 10:31

answered Dec 19 '11 at 15:53

Michael J. Barber

24,518
9
68
88

1

actually, length() is a gawk extension AFAIK, it doesn't work on pure awk http://stackoverflow.com/questions/14720898/illegal-reference-to-an-array-in-awk-i-am-having-trouble-figuring-out-awk – Apr 02 '14 at 10:03
1

@vaxquis I'm not sure what you mean by "pure" awk, but `length` is in POSIX. The gawk extension is in applying to arrays instead of strings. Fortunately, we can just switch `length(chars)` to `length($0)`. – Michael J. Barber Apr 02 '14 at 10:31
2

"pure" awk in the sense of "not any extended awk"... and yes, I meant this usage of length(); also, you can use "len = split(...)" and later "i<=len" with the same result. – Apr 02 '14 at 10:34
one more question - this obviously ignores whitespaces in the input - is there any way to split the data so that I actually *know* where the whitespaces were? or do I have to parse each 'record' (line) separately to do that? – Apr 02 '14 at 10:47

score 24 · Answer 2 · answered Dec 19 '11 at 16:18

24

By default in awk the Field Separator (FS) is space or tabs. Since you mentioned you wanted to loop over each character and not word, we will have to redefine the FS to nothing. Something like this -

[jaypal:~/Temp] echo "here is a string" | awk -v FS="" '
{for (i=1;i<=NF;i++) printf "Character "i": " $i"\n"}' 
Character 1: h
Character 2: e
Character 3: r
Character 4: e
Character 5:  
Character 6: i
Character 7: s
Character 8:  
Character 9: a
Character 10:  
Character 11: s
Character 12: t
Character 13: r
Character 14: i
Character 15: n
Character 16: g

answered Dec 19 '11 at 16:18

jaypal singh

74,723
23
102
147

1

hm. actually, it works when FS is set in the code, but in a bit different way... (eg. first line is not parsed) Any ideas why? – Apr 02 '14 at 09:54
3

It's because the first line is already read before with default FS. – jaypal singh Apr 02 '14 at 11:46
3

@vaxquis You have to do it in BEGIN: `'BEGIN {FS="";}'` – alephreish Apr 14 '14 at 14:36
1

@vaxquis Setting the `FS` outside the code or in `BEGIN` is one and the same thing. `BEGIN` block is read only once before the first line of input is read. – jaypal singh Apr 14 '14 at 15:44
Note that the behavior of an empty `FS` in many implementations of awk is undefined: https://stackoverflow.com/questions/22044272/gawk-fs-to-split-record-into-individual-characters – Soren Bjornstad Jan 15 '19 at 12:17

score 8 · Answer 3 · answered Dec 19 '11 at 20:04

Not all awk implementations support the above solutions. In that case you could use substr:

echo here is a string | awk '{
  for (i=0; ++i <= length($0);) 
    printf "%s\n", substr($0, i, 1)
  }'

P.S. In some awk implementations length without arguments defaults to $0, i.e. length and length($0) are equivalent.

score 2 · Answer 4 · answered Dec 19 '11 at 16:17

2

if you have gawk:

awk '$0=gensub(/(.)/,"\\1\n","g")' file

test:

kent$  echo "I am a String"|awk '$0=gensub(/(.)/,"\\1\n","g")'
I

a
m

a

S
t
r
i
n
g

answered Dec 19 '11 at 16:17

Kent

189,393
32
233
301

is there a way to do something with each character using this method or is it just reformatting the string? – user.friendly Jun 27 '16 at 18:28
1

it can "do something with each char". but it depends what is "something" – Kent Jun 27 '16 at 20:23

loop over characters in input string using awk

4 Answers4

Linked

Related