How to search for a six-letter word with letters in alphabetical order

Question

I recently created a Perl script that searches for words that begin with D and and E with this code:

$infile = 'words.txt';
open(IN, $infile);
$count = 0;
while ($word = <IN>) {
chomp($word);
if ($word =~ /^d\w*e$/i) {
    print "$word\n";
    $count++;
  }
}
print "$count\n";

I recently decided to fork the code and create a script that searches for a word that is six letters and letters in the word are in alphabetic (A to Z) order. Instead of using words.txt, I plan to use the Unix standard dictionary located at usr/share/dict/words. How can I accomplish this by modifying this code?

First, extract all the 6 characters word. Then, for each of them, test the condition. — Benjamin Toueg, Jan 07 '13 at 01:18
I suggest you find all words in the `words` file that fulfil your criterion and then just search for any of those. What is the purpose of this task? — Borodin, Jan 07 '13 at 01:26
Ok, I made my question more broad. Any suggestions on how to modify this code? — sharksfan98, Jan 07 '13 at 01:34
Note that `\w` matches all numbers and the underscore character as well as letters. You probably want `\pL` or `[a-zA-Z]`. — Borodin, Jan 07 '13 at 01:41
1. get all 6 letter words. 2. make sure each letter is > than the previous one. — JaredC, Jan 07 '13 at 01:45
Is this just an arbitrart task you have set yourself to learn the language? If so then I don't see how it helps to come here for a solution. — Borodin, Jan 07 '13 at 02:30

Borodin · Accepted Answer · 2013-01-07T05:45:52.833

It looks like what you really need is an algorithm for checking whether the letters in a given word are in alphabetic sequence. There are several ways, but this subroutine works by splitting the word into a list of its constituent characters, sorting that list and recombining it. If the result matches the original word then that word was already sorted.

use strict;
use warnings;

use feature 'fc';

for (qw/ a ab ba cab alt effort toffee /) {
  print "$_\n" if in_alpha_order($_);
}

sub in_alpha_order {
  my $word = fc(shift);
  my $new = join '', sort $word =~ /./g;
  return $new eq $word;
}

output

a
ab
alt
effort

If you really wanted to do this in a regular expression, you could build an alternation like

a(?=[a-z]) | b(?=[b-z]) | c(?=[c-z]) ...

Here is a program that works that way. Its output is identical to that of the one above.

use strict;
use warnings;

my $regex = join '|', map "${_}(?=[$_-z])", 'a'..'z';
$regex = qr/^(?:$regex)*.$/i;

for (qw/ a ab ba cab alt effort toffee /) {
  print "$_\n" if $_ =~ $regex;
}

note: a single Unicode codepoint can span multiple bytes and a single user-perceived character can span multiple Unicode codepoints. To get correct number of character in a word that might contain non-ascii characters, you could use `/\X/` and to sort alphabetically you could use `Unicode::Collate`. See [point 32 in @tchrist's answer](http://stackoverflow.com/a/6163129/4279). — jfs, Jan 08 '13 at 14:07

score 2 · Answer 2 · edited Sep 23 '17 at 23:50

To support non-ASCII words:

#!/usr/bin/perl
use strict;
use warnings;
use encoding 'utf8'; # Upgrade byte strings using UTF-8
use Unicode::Collate; # To sort letters alphabetically

use constant NCHARS => 6; # Consider only words with NCHARS characters in them
my $filename = '/usr/share/dict/words';
open (my $fh, '<:encoding(UTF-8)', $filename)
    or die "can't open '$filename' $!";

my $collator = Unicode::Collate::->new();
while (my $word = <$fh>) {
    chomp $word;
    my @chars = ($word =~ /\X/g); # Split word into characters
    # Print words with given length that have characters in alphabetical order
    print "$word\n" if (@chars == NCHARS &&
                        join('', $collator->sort(@chars)) eq $word);
}
close $fh;

Kenosis · Answer 3 · 2013-01-07T02:54:08.553

1

Here's one option:

#!/usr/bin/env perl

use warnings;
use strict;

my $wordsFile = '/usr/share/dict/words';
my $count     = 0;

open my $fh, '<', $wordsFile or die $!;

while ( my $word = <$fh> ) {
    chomp $word;
    next unless length $word == 6;

    my $sorted = join '', sort split //, lc $word;

    if ( $sorted eq lc $word ) {
        print "$word\n";
        $count++;
    }
}

close $fh;

print "$count\n";

This splits the original word to alphabetize the letters. The letters are rejoined to form a new word. A comparison is then made against the original word. If they're the same, it's printed and counted.

edited Jan 07 '13 at 02:54

answered Jan 07 '13 at 02:34

Kenosis

6,196
1
16
16

I think the original `lc` was necessary. Or ideally `fc`. – Borodin Jan 07 '13 at 02:46
@Borodin - Yes, you're correct. Originally thought it did, but I misread the output after removing it. Thanks for catching this. – Kenosis Jan 07 '13 at 03:02

ajmccluskey · Answer 4 · 2013-01-07T20:42:52.847

1

I have a solution similar to Kenosis' and Borodin's, however you need to be careful about case. Perl's default sort function puts all capital letters before lower case. My version below takes care of this.

#!/usr/bin/env perl

use strict;
use warnings;

sub is_six_letter_word {
    my $word = shift;
    return length($word) == 6;
}

sub is_letters_in_alphabetical_order {
    my $word = shift;
    $word = fc($word);

    my @chars = split("", $word);
    my $sorted_word = join("", sort(@chars));

    return $word eq $sorted_word;
}

open(my $fh_in, $ARGV[0]) or die "Error opening input file";

my $word = undef;
while ($word = <$fh_in>) {
    chomp($word);
    if (is_six_letter_word($word) && is_letters_in_alphabetical_order($word)) {
        printf("%s\n", $word);
    }
}

close($fh_in);

edited Jan 07 '13 at 20:42

answered Jan 07 '13 at 02:51

ajmccluskey

505
4
10

The two `return`s are an eyesore. Why not just `return length($word) == 6` ? – Zaid Jan 07 '13 at 15:38
Thanks for the feedback. I think I got into the habit of explicitly returning 1 or 0 for true/false so I knew what I was getting when I started with Perl. Thinking about it again I don't think it was ever necessary. – ajmccluskey Jan 07 '13 at 20:44

How to search for a six-letter word with letters in alphabetical order

4 Answers4

Linked