0

Need some help in shell programming.

I need to write a shell script which accepts multiple text file as arguments and count the word occurrences from all of them.

For Eg file1.txt contains text

mary had a little lamb. His fleece was white as a snow. And everywhere that mary went.

and file2.txt contains

Mary had a little lamb. Hello How are you

So the script should give the output like

Mary 2
Had 2
a  2
white 1
.
.
.

Thanks in advance

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Take a look at [this](http://stackoverflow.com/a/15108286/1387612) answer – janisz Jan 20 '15 at 22:29
  • Hi ...I have tried something like this - cat $@ | tr -s ' ' '\n' | uniq -c | sort -nr but it does not combine the values – user3809572 Jan 20 '15 at 22:33
  • @user3809572 Just sort before uniq as well: `cat $@ | tr -s ' ' '\n' | sort | uniq -c | sort -nr`. Tested: OK – sehe Jan 20 '15 at 22:38

2 Answers2

0

How about

cat file*.txt | 
    xargs -n1 |
    awk '{h[$1]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' 

Which prints

3 a
2 mary
2 little
2 lamb.
2 had
1 you
1 white
1 went.
1 was
1 that
1 snow.
1 Mary
1 How
1 His
1 Hello
1 fleece
1 everywhere
1 as
1 are
1 And

It's adapted from a script explained on this website which goes on to make a graph of it:

cat file*.txt | xargs -n1 | awk '{h[$1]++}END{for(i in h){print h[i],i|"sort -rn|head -20"}}' |awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'

Printing

          a     3 ############################################################ 
       mary     2 ######################################## 
     little     2 ######################################## 
      lamb.     2 ######################################## 
        had     2 ######################################## 
        you     1 #################### 
      white     1 #################### 
      went.     1 #################### 
        was     1 #################### 
       that     1 #################### 
      snow.     1 #################### 
       Mary     1 #################### 
        How     1 #################### 
        His     1 #################### 
      Hello     1 #################### 
     fleece     1 #################### 
 everywhere     1 #################### 
         as     1 #################### 
        are     1 #################### 
        And     1 #################### 
sehe
  • 374,641
  • 47
  • 450
  • 633
0
#!/bin/sh

str=""
for i in $@
do
    str="${str}$(sed 's|\.||g' $i) " # remove the period and add space between files.
done
echo $str | tr -s ' ' '\n' | sort | uniq -c | sort -nr

$ thescript file1.txt file2.txt

Output:

  3 a
  2 mary
  2 little
  2 lamb
  2 had
  1 you
  1 white
  1 went
  1 was
  1 that
  1 snow
  1 Mary
  1 How
  1 His
  1 Hello
  1 fleece
  1 everywhere
  1 as
  1 are
  1 And
Jared Rummler
  • 37,824
  • 19
  • 133
  • 148