Bash - Count frequency of palindromes from text file

Question

This is a follow up from my other post: Printing all palindromes from text file

I want to be able to print to amount of palindromes that I have found from my text file similar to a frequency table. It'll show the amount of the word followed by the word, similar to this format:

100  did
32   sas
17   madam

My code right now is:

#!usr/bin/env bash

function search
{
    grep -oiE '[a-z]{3,}' "$1" | sort -n | tr '[:upper:]' '[:lower:]' | while read -r word; do
        [[ $word == $(rev <<< "$word") ]] && echo "$word" | uniq -c
    done
}
search "$1"

In comparison to the last post I did: Printing all palindromes from text file . I have added "sort -n" and "uniq -c" which from my knowledge is to sort the palindromes found in alphabetical order, then "uniq -c" is to print the number of occurrences of the words found.

Just to test script I have a testing file named: "testingfile.txt" . This contains:

testing words testing words testing words 
palindromes
Sas
Sas
Sas
sas
bob
Sas
Sas
Sas Sas madam
midim poop goog tot sas did i want to go to the movies did
otuikkiuto

pop
poop

This file is just so I can test before trying this script on a much larger file in which it'll take much longer.

When typing in the console: (also to note "palindrome" is the name of my script)

source palindrome testingfile.txt

The output appears like this:

1 bob
1 did
1 did
1 goog
1 madam
1 midim
1 otuikkiuto
1 poop
1 poop
1 pop
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 tot

Is there something I am missing to get the result that I want:

9 sas
2 did
2 poop
1 bob
1 goog
1 madam
1 midim
1 otuikkiuto
1 pop
1 tot

Solutions to this would be greatly appreciated! If there are solutions with other commands that are needed an explanation of the reasoning behind the other commands are also greatly appreciated.

Thank you

janos · Accepted Answer · 2017-11-04T20:54:48.730

2

You missed two important details:

You need to pass all input at once to uniq -c to count them, not one by one to one uniq each
uniq expects its input to be sorted. The sort you had in the grep pipeline is ineffective, because after the transformation to lowercase, the values would need to be sorted again

You can apply sort | uniq -c to the output of an entire loop, by piping the loop itself:

grep -oiE '[a-z]{3,}' "$1" | tr '[:upper:]' '[:lower:]' | while read -r word; do
    [[ $word == $(rev <<< "$word") ]] && echo "$word"
done | sort | uniq -c

Finally, to get an output sorted in descending order by count, you need to further pipe the output to sort -nr.

edited Nov 04 '17 at 20:54

answered Nov 04 '17 at 20:51

janos

120,954
29
226
236

So in my code was it looping through and doing "uniq -c" on each word found? And then in your code supplied it will do "uniq -c once it finishes reading through the entire file? – Jhonathan Nov 04 '17 at 20:53
@Jhonathan yes you were running `uniq` once for each file. If you pipe the loop itself to `| sort | uniq -c | sort -nr`, then you will be filtering all the output produced in the loop's body. – janos Nov 04 '17 at 20:56
also by any chance from the answer you gave my previous question, would this then mean the script will run faster as it'll be doing less work on each loop? – Jhonathan Nov 04 '17 at 20:58
@Jhonathan yes it will run much faster, because it will run much fewer processes – janos Nov 04 '17 at 20:59

Bash - Count frequency of palindromes from text file

1 Answers1