C Programming: Finding the Longest Word from A Dictionary with Given Characters

Question

This is a minimised repost of a question i asked earlier. I am a beginner in C programming. I am attempting to create a Countdown program but in which the user selects eight consonants and/or vowels and has to devise the longest word from these letters. The computer will then read a dictionary file and find the longest possible words. This function is a part of the program in which I compare the countdown letters with a dictionary file.

#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>

int main()
{
    int i, j, n;
    long int n1 = 0, n2 = 0, n3 = 0, n4 = 0, n5 = 0, n6 = 0, n7 = 0, n8 = 0;
    char line[9];
    char exampleLetters[] = "feacnehp";
    char *fileName = "D:\\webster.txt";     //Dictionary file

   FILE *fp = fopen(fileName, "r");

   if (fp == NULL) {
       printf("Error opening file!\n");
    }
    else
    {
       while (!feof(fp)) {
            fgets(line, 9, fp);

            n = 0;

            for (i = 0; i <  8; i++) {

                for (j = 0; j < 8; j++) {
                    if (line[i] == exampleLetters[j]) n++;
                }
            }

            if (n == 1) n1++;   //These values are incremented everytime a word of that amount of letters is found i.e. n1++ when a one letter word is found
            if (n == 2) n2++;
            if (n == 3) n3++;
            if (n == 4) n4++;
            if (n == 5) n5++;
            if (n == 6) n6++;
            if (n == 7) n7++;
            if (n == 8) n8++;
        }

        printf("%li %li %li %li %li %li %li %li\n", n1, n2, n3, n4, n5, n6, n7, n8);    //This is irrelevant but just to display the amount of each number of words
    }

    fclose(fp);

    return 1;
}

My problem is in the readFile function. I'm not sure how to compare the countdown letters with the dictionary file. I am able to count the amount of words that match with the letters. Should I read the words in and use malloc to continually allocate memory or is there a better alternative?

Trying to understand.. so I'll give you 8 letters, that means there are 8! == 40320 different permutations of those letters, and you want to find which permutation creates the longest word? — yano, Apr 13 '16 at 23:16
The buffer seems too small. If there are 8 characters in each lines, an newline character after the first line will be in the next read and it may cause some trouble. — MikeCAT, Apr 13 '16 at 23:18
@yano Yeah. I have a previous part of the program that generates eight random vowels and/or consonants. This is what the exampleLetters are for and I want to compare these letters through a dictionary file to find the longest words possible. — Ulysses, Apr 13 '16 at 23:21
[`!feof(fp)` is a bad loop condition](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong). You should check if `fgets()` was successful before using "the data read". — MikeCAT, Apr 13 '16 at 23:21
@MikeCAT An array would be more efficient. My main focus is comparing the letters with the words in the file and a larger buffer would be better. Thanks! — Ulysses, Apr 13 '16 at 23:23
Will all 8 letters have to be used or can there be some unused extras? Hardest part to me would be coming up with an algorithm that iterates through all possible permutations of the letters, although I'm sure there are some folks here that could get that going no sweat. — yano, Apr 13 '16 at 23:27
@yano Unfortunately i have to use all eight letters. The file only contains eight letter words at most too — Ulysses, Apr 13 '16 at 23:31
I'd say that's fortunate. You don't have to compare partial strings then, that part will be straightforward. You need a function that spits out the next permutation of your characters, then just `strcmp` that with each line in the dictionary. Getting the next permutation is the question mark for me. — yano, Apr 13 '16 at 23:35

score 1 · Answer 1 · answered Apr 14 '16 at 01:58

You don't need to consider permutations of the letters. All you do is stick them in an array and sort them (all lower-case).

When you read your dictionary, you do the same thing: Keep the original word, and make a "key" by sorting the letters (again, all lower-case).

If a letter can be used more than once then you would reduce these "keys" (as well as your input "key") by removing duplicate characters.

Now you can use a simple search through your dictionary looking for the longest word whose "key" matches (or is a subset of) yours. Because everything is sorted, you can use strstr to look for a full or partial match:

if( NULL != strstr( word_key, available_chars_key ) ) ...

figured there was probably a better way than brute force permutations ... nice! — yano, Apr 14 '16 at 14:10

score 1 · Accepted Answer · answered Apr 16 '16 at 21:05

int amount;
char **longestWords;    //Used a double pointer to allocate memory based on how many strings there where and the size of each string
char exampleLetters[] = "nailshob";     //These are a group of example letters that will be randomly generated in a previous part of the program
char *fileName = "D:\\webster.txt"; 

int initialiseWords(int num);
int copyWords(int val);

int main()
{
    int i, j, k, len;
    long int n[8] = { 0 };     //Used an array as it was suggested to be more efficient than 8 integers
    char line[12];
    char temp[9];

    FILE *fp = fopen(fileName, "r");

    if (fp == NULL) {
        printf("Error opening file!\n");
    }
    else
    {
        while (fgets(line, sizeof(line), fp)) {
            k = 0;
            len = strlen(line) - 1    //-1 because of '\n'

            strcpy(temp, exampleLetters);    //Copied the letters to a temporary string

            for (i = 0; i < len; i++)
            {
                for (j = 0; j < 8; j++)
                {
                    if (line[i] == temp[j])
                    {
                        temp[j] = NULL;    //If the character is found I eliminate it from the temporary string to prevent duplicated letters from affecting the results
                        k++;    //k is incremented every time a letter in the temporary string is the same as the letter in line
                        break;
                    }
                }

            for (i = 8; i > 0; i--)
            {
                if (k == i && len == i) {    //If k is equal to i and the same length as i an n value is incremented e.g. n[4]++ if a 5 letter word is found. The reason I also use strlen in the if condition is because sometimes it may find a word has the same mutual amount of letters but it is a longer length e.g. the longest word for "nailhobq" is hobnail but it may also increment with hobnails
                    n[i - 1]++;
                    break;
                }  
            }
        }

        for (i = 7; i >= 0; i--)
        {
            if (n[i] != 0)
            {
                initialiseWords(n[i]);    //Allocates memory to **longestWords based on the size of n[i]
                copyWords(i);    //This function is then used to copy the longest words to **longestWords
                break;
            }
        }
    }

    fclose(fp);

    return 1;
}

int initialiseWords(int num)
{
    int i;

    longestWords = (char**)malloc(num * sizeof(char*));

    for (i = 0; i < num; i++)
    {
        longestWords[i] = (char*)malloc((9)* sizeof(char));
    }

    return 1;
}

int copyWords(int val)
{
    int i, j, k, l;
    char line[12];
    char temp[9];

    FILE *fp = fopen(fileName, "r");

    if (fp == NULL) {
        printf("Error opening file!\n");
    }
    else
    {
        l = 0;

        while (fgets(line, 12, fp)) {
            k = 0;
            strcpy(temp, exampleLetters);

            for (i = 0; i < strlen(line)-1; i++)
            {
                for (j = 0; j < 8; j++)
                {
                    if (line[i] == temp[j])
                    {
                        temp[j] = NULL;
                        k++;
                        break;
                    }
                }
            }

            if (k == val + 1 && (strlen(line) - 1) == val + 1)
            {
                strcpy(longestWords[l], line);    //Same process as the original function until here. I copy the words that represented the highest values of i in n[i] to **longestWords 
                l++;
            }
        }
    }

    fclose(fp);

    return 1;
}

I know it isn't the most efficient method but it works to my understanding. I decided to create a temporary string in the while loop that copied over exampleLetters[]. Every time line found a common character with the temporary string, I incremented k and changed the character in the temporary string to NULL to prevent duplicate characters from affecting the results. Next I compared with k and the length of line with the highest value of i and worked down. If it found a match it would increment the appropriate n[]. Now that I knew what was the highest value of n[], therefore the highest lettered word, I allocated memory based on how many of n lettered words there were and used the copyWords(int val) function to copy over the words to **longestWords. Thanks for those who helped me.

C Programming: Finding the Longest Word from A Dictionary with Given Characters

2 Answers2