1

Problem

I would like to perform a command on each letter of a string used in a shell script (bin/bash). In the case noted below I will be sending Chinese characters to the "$@" input but there are no spaces and no separators in the string. I am contemplating making use of the string length and then accessing the index of each place in the string: Here's what I have so far (note rdef is a custom command that I've created)

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin/: 
export PATH
for f in "$@"
do
    //need to loop through the input and perform action on each index of the $f variable
    rdef "$f"|awk -F '\|' '{ gsub(/^ +| +$/, "", $2); print $2 }'
done

Standard input of rdef:

rdef 快乐

Standard output of rdef:

Definition of <快乐>: | kuài lè |
happy
merry

Update

Although the other question is similar it is not the same context. For example in this case I need to split string passed in to a script as an argument. I also need to apply the split string to a chained set of commands. All of which present nuances not covered in the related question.

I've tried the following code which does not seem to work against Chinese characters. When I plug in ASCII characters then the command executes and returns correct results.

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin/: 
export PATH


for f in "$@"

do
    foo="$f"

    for (( i=0; i<${#foo}; i++ )); do
        rdef ${foo:$i:1}|awk -F '\|' '{ gsub(/^ +| +$/, "", $2); print $2 }'
    done

done

NB:

My final command line should enable me to execute the custom command chained to awk on each letter:

rdef "$letter-var"|awk -F '\|' '{ gsub(/^ +| +$/, "", $2); print $2 }'

More information on rdef can be found at the following OS question

Solution

All of the solutions offered worked well. I chose the option offered by @kojiro as he pointed me in the proper direction regarding the UTF-8 being required. That was an important discovery as the double byte nature of the Chinese characters was corrupting execution of the loop.

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin/: 
export PATH
LC_CTYPE=UTF-8
x=$1

for ((i=0;i<${#x};i++)); do rdef "${x:i:1}" | awk -F '\|' '{ gsub(/^ +| +$/, "", $2); print $2 }'; done
Community
  • 1
  • 1
Tommie C.
  • 12,895
  • 5
  • 82
  • 100
  • 1
    Any reason you're not just doing a while/read loop with the `-n 1` flag for read? [This](http://stackoverflow.com/q/10551981/3076724) seems to be on point as well, though a bit confused why no one listed a `read -n 1` solution. – Reinstate Monica Please Mar 22 '14 at 20:15
  • 1
    What encoding are the Chinese characters you're working with? – kojiro Mar 22 '14 at 20:51
  • 1
    You should quote `"${foo:$i:1}"` (wouldn't matter with ascii characters, might with others) – Reinstate Monica Please Mar 22 '14 at 20:52
  • You've wrapped the question as "perform action on each letter…" but you're really asking for help debugging a larger-context problem. Please give the definition of `rdef` and sample inputs and outputs. – kojiro Mar 22 '14 at 21:53
  • @kojiro I've updated the question with a reference to the rdef command but essentially it just reads data from the OSX dictionary and outputs some data. – Tommie C. Mar 22 '14 at 21:59

4 Answers4

2

Bash 4 has substring slicing built in:

$ x='红楼梦'
$ for ((i=0;i<${#x};i++)); do echo "${x:i:1}"; done
红
楼
梦
kojiro
  • 74,557
  • 19
  • 143
  • 201
  • 1
    Forget the previous comments - this does work when I set the following script environmental variable LC_CTYPE=UTF-8 Thanks for your help! – Tommie C. Mar 23 '14 at 19:46
1

You could use awk to execute a command on each letter.

echo "XXXXX" \
| awk -v FS="" '{ for( I=1 ; I <= NF ; I++ ){ system( "command " $I ) } }
  • FS="" tells awk that each character is a separate field.
  • The for loop iterates on the characters and execute the command.
  • You need to replace command with the command you want to execute.

For example:

echo "いい天気ですね " \
| awk -v FS="" '{ for( I=1 ; I <= NF ; I++ ){ system( "echo \"x" $I "x\"" ) } }'

Will display:

xいx
xいx
x天x
x気x
xでx
xすx
xねx
x x

You will need a awk with multibyte characters support.

  • FYI if you end a line with `|` bash knows the command will continue on the next line, so `echo "いい天気ですね " |` would void the need for backslash-escaping the newline. – kojiro Mar 22 '14 at 20:47
  • I prefer the pipe to be at the beginning of the line to better visualize how the command are chained. But I agree this is debatable. –  Mar 22 '14 at 23:28
1

You can employ perl too:

perl -C -lnE 'say for split //' <<<"红楼梦"

prints

红
楼
梦
clt60
  • 62,119
  • 17
  • 107
  • 194
  • Nothing was wrong with your solution. Earlier comment was a moment of brain death... (I use chaining in my problem description). Thinking too hard about the problem. The issue turned out to be setting the proper character encoding for the shell. I previously up voted your answer because it works. – Tommie C. Mar 23 '14 at 20:28
  • @TommieC. i only was surprised with a comment about _how to pipe_..;) thanx. – clt60 Mar 23 '14 at 20:30
1

You could use sed to add the missing spaces, which would make your for loop iterate on each character:

for f in $( echo "$*" | sed -e 's/\(.\)/\1 /g' )
do
  ...
done