-2

I have a huge dictionary file that contains each word in each line, and would like to split the files by the first character of the words.

a.txt --> only contains the words that start with a

I used this awk commands to successfully extract words that start with b.

  awk 'tolower($0)~/^b/{print}' titles-sorted.txt > b.txt

Now I wanted to iterate this for all alphabets

  for alphabet in {a..z} 
    do
        awk 'tolower($0)~/^alphabet/{print}' titles-sorted.txt > titles-links/^alphabet.txt
    done 

But the result files contain no contents. What did I do wrong? I don't even know how to debug this. Thanks!

pandagrammer
  • 841
  • 2
  • 12
  • 24

1 Answers1

2

Because your awk program is in single quotes, there will not be any shell variable expansion. In this example:

awk 'tolower($0)~/^alphabet/{print}' titles-sorted.txt > titles-links/^alphabet.txt

...you are looking for the lines that begin with the literal string alphabet.

This would work:

awk "tolower(\$0)~/^$alphabet/{print}" titles-sorted.txt > titles-links/$alphabet.txt

Note several points:

  • We are using double quotes, which does not inhibit shell variable expansion.
  • We need to escape the $ in $0, otherwise the shell would expand that.
  • We need to replace alphabet with $alphabet, because that's how you refer to shell variables.
  • We need to replace ^alphabet with $alphabet in the filename passed to >.

You could also transform the shell variable into an awk variable with -v, and do this:

for alphabet in {a..z} ; do
    awk -valphabet=$alphabet 'tolower($0)~"^"alphabet {print}' /usr/share/dict/words > words-$alphabet.txt
done
larsks
  • 277,717
  • 41
  • 399
  • 399