How to iterate through the words of my text document in shell. I want to display number of words in my text document

Question

I tried this one, but it is displaying the count on number of lines instead.

declare -i x=0 while IFS="" read -r p || [ -n "$p" ] do x=x+1 done <test.txt echo "$x

I would be thankful if someone could explain this since i am a beginner

Hi @tkausl. This is working. Thank you soo much. But i want to iterate through the words. Not only count of words. Thanks in advance — Chitti_the_robot, Sep 10 '18 at 06:42
Please, post some sample data with expected output to avoid misunderstanding of the question. — James Brown, Sep 10 '18 at 06:59
`for i in $(cat file); do something $i; done` instead of using read & redirections is probably the simplest solution — Sam, Sep 10 '18 at 07:17
@Sam `for i in $(cat file)` is a well-known anti-pattern. There is always a better solution than that. — Ed Morton, Sep 11 '18 at 14:30
what would be your prefered solution and why then? i am well aware that the pattern is frequently misused, but to me that alone does not mean it should never be used. — Sam, Sep 11 '18 at 14:48
@Sam, if it contains `*`, you get a list of filenames being iterated over. Why would you ever use it, when there are alternatives that don't have the side effects and bugs? `while read -r -a words; do for word in "${words[@]}"; do ...; done; done — Charles Duffy, Sep 11 '18 at 15:54
you have a point in that it was reckless of me to suggest that without a reminder to toggle globbing with `set -f` / `set +f` if there is the slightest possibility the file may contain any special characters. — Sam, Sep 11 '18 at 17:00
do note however that `set -f; for i in $(cat file); do echo $i >/dev/null; done; set +f` times about twice as fast as the equivalent `while read -r -d' ' i; do echo $i >/dev/null; done` for a large file on my system and that the array solution may fail for very long lines. — Sam, Sep 11 '18 at 17:10

KamilCuk · Answer 1 · 2018-09-10T09:31:31.710

2

Assuming your words are separated by tabs, spaces ad newlines, the following snippet:

echo $'word1 word2! word3
\tword4\t\t\t\t\t\tword5\tword6
word7 word8


word9 word10' | \
while IFS=$'\t ' read -ra linewords; do
    for i in "${linewords[@]}"; do
            echo word is "'$i'"
    done
done

will output:

word is 'word1'
word is 'word2!'
word is 'word3'
word is 'word4'
word is 'word5'
word is 'word6'
word is 'word7'
word is 'word8'
word is 'word9'
word is 'word10'

It uses multiple IFS values combined with read reading into an array, see this answer on how to split a string on a delimeter.

edited Sep 10 '18 at 09:31

answered Sep 10 '18 at 07:11

KamilCuk

120,984
8
59
111

2

You chose a convenient input for which your code works :) Try to use a tab between `word5` and `word6` instead of the space. The issue is that you want to use `$'...'` instead of `$"..."`. See [manual](https://www.gnu.org/software/bash/manual/bash.html#Locale-Translation) for explanation of `$"..."`. Also, since `read` reads lines by default, the `\n` is not necessary. – PesaThe Sep 10 '18 at 09:24

score 1 · Answer 2 · answered Sep 10 '18 at 07:24

I'd use awk for that:

$ echo "Lorem ipsum dolor sit amet,
        consectetur adipisci elit,
        ..." | 
awk '{
    for(i=1;i<=NF;i++)
        print "iterating " $i
}'

Output:

iterating Lorem
iterating ipsum
iterating dolor
iterating sit
iterating amet,
iterating consectetur
iterating adipisci
iterating elit,
iterating ...

score 0 · Answer 3 · answered Sep 10 '18 at 08:34

grep -oE '\w+' YOUR_FILE.txt

writes the words in YOUR_FILE.txt to standard output. Pipe this into your loop, and you have an iteration over the words.

This assumes that a "word" in your case is one or more characters described by \w, i.e. either an underscore or what your current locale defines to be an alphanumeric character. If your idea of a "word" is different, you can of course tailor the regular expression according to your needs.

How to iterate through the words of my text document in shell. I want to display number of words in my text document

3 Answers3