Pangram in C using functions

Question

When I input The quick brown fox jumps over the lazy dog, the following program prints not a pangram. Yet, I expect s to be 26 and printf("pangram") to be executed. What am I doing wrong?

#include <ctype.h>
#include <stdio.h>
#include <string.h>

char findpan(char arr[]) {
    int i, j, count = 0;
    for (i = 0; i < strlen(arr); i++) {
        if (isalpha(arr[i]))
            count++;
    }
    for (i = 0; i < strlen(arr); i++) {
        for (j = i + 1; j < strlen(arr); j++) {
            if (arr[i] == arr[j])
                count--;
        }
    }
    return (count);
}

int main() {
    int s;
    char str[60];
    fgets(str, 60, stdin);
    s = findpan(str);
    if (s == 26)
        printf("pangram");
    else
        printf("not a pangram");
    return 0;
}

One thing that is definitely a problem is that you are counting uppercase and lowercase letters as different ones. — Eugene Sh., Jun 30 '21 at 19:20
`s` is `-5`. The problem is the algorithm. Should you test for `0`? and be aware that if there are 4 o's in the text, you are subtracting 6, not 3. — Weather Vane, Jun 30 '21 at 19:21
Perhaps it's time to learn how to use a *debugger* to step through your code statement by statement, while monitoring variables and their values. — Some programmer dude, Jun 30 '21 at 19:21
Note that in your algorithm, you "uncount" any duplicated characters, not just alpha, so duplicate space characters cause `count` to decrement as well. — SGeorgiades, Jun 30 '21 at 19:50

Vlad from Moscow · Answer 1 · 2021-06-30T20:13:57.830

If I have understood what you are trying to do then these nested loops

for (i = 0; i < strlen(arr); i++) {
    for (j = i + 1; j < strlen(arr); j++) {
        if (arr[i] == arr[j])
            count--;
    }
}

are incorrect. Let's assume that you have string "AAA". So after the preceding loop count will be equal to 3.

Now after these nested loops count will be equal to 0 instead of 1. That is when i = 0 then for j = 1 and j = 2 arr[j] is equal to arr[i]. So count will be decreased two times. When i = 1 then for j = 2 again arr[j] = arr[i] and count will be decreased one more.

Also it seems you should ignore cases of letters.

I can suggest the following function implementation as it is shown in the demonstrative program below.

#include <stdio.h>
#include <ctype.h>

size_t findpan( const char *s )
{
    size_t count = 0;
    
    for ( const char *p = s; *p; ++p )
    {
        if ( isalpha( ( unsigned char ) *p ) )
        {
            char c = tolower( ( unsigned char )*p );
            
            const char *q = s;
            while ( q != p && c != tolower( ( unsigned char )*q ) ) ++q;
            
            if ( q == p ) ++ count;
        }
    }
    
    return count;
}

int main(void) 
{
    printf( "%zu\n", findpan( "The quick brown fox jumps over the lazy dog" ) );
    
    return 0;
}

The program output is

Without using pointers the function can look the following way

size_t findpan( const char *s )
{
    size_t count = 0;
    
    for ( size_t i = 0; s[i] != '\0'; i++ )
    {
        if ( isalpha( ( unsigned char ) s[i] ) )
        {
            char c = tolower( ( unsigned char )s[i] );
            
            size_t j = 0;
            while ( j != i && c != tolower( ( unsigned char )s[j] ) ) ++j;
            
            if ( j == i ) ++count;
        }
    }
    
    return count;
}

THank you so much, i have understood my mistake. However i am not yet familiar with pointers etc so I wont be able to implement the code you suggested but i will definitely try to change theb logic and solve the problem with a different approach. Thanks again — ACHAL KAMBOJ, Jun 30 '21 at 20:09

score 1 · Answer 2 · answered Jul 01 '21 at 10:07

If we assume 8 bit characters and can stand allocating 256 bytes on the stack temporarily, then this is both readable, compact and fairly efficient:

bool is_pangram (const char* str)
{
  char used [256]={0};
  for(; *str!='\0'; str++)
  {
    used[*str]=1;
  }
  return memchr(&used['a'], 0, 26)==NULL; // 26 letters in the alphabet
}

The 256 byte zero-out might seem inefficient, but the mainstream x86 compilers run that in 16 instructions. This function also makes no assumptions of adjacency of 'a' to 'z'. To add support for upper case, simply do used[tolower(*str)]=1; though that might introduce a lot of branching.

Test code:

#include <stdio.h>
#include <stdbool.h>
#include <string.h>

bool is_pangram (const char* str)
{
  char used [256]={0};
  for(; *str!='\0'; str++)
  {
    used[*str]=1;
  }
  return memchr(&used['a'], 0, 26)==NULL;
}

int main (void) 
{
  const char* test_cases[] = 
  {
    "",
    "hello, world!",
    "the quick brown fox jumps over the lazy dog",
    "the quick brown cat jumps over the lazy dog",
    "junk mtv quiz graced by fox whelps",
    "public junk dwarves hug my quartz fox",
  };

  for(size_t i=0; i<sizeof test_cases/sizeof *test_cases; i++)
  {
    printf("\"%s\" is %sa pangram\n", test_cases[i], is_pangram(test_cases[i])?"":"not ");
  }

  return 0;
}

This needs to be `unsigned char`, otherwise high non-ASCII characters can access outside the array. (`char` is signed on some implementations, notably mainstream x86). Or clear the MSB along with forcing everything to upper-case to make it case-insensitive, like `used[ (~(0x20u|0x80u) ) & *str ] = 1;` instead of the more easily readable version of forcing to lower-case with `0x20 | *str`. (The algorithm is ASCII-only; probably won't work for 8-bit character sets where accented characters are also letters, so no need for `tolower` except readability.) Otherwise yeah, nice. — Peter Cordes, Jun 05 '23 at 21:12
If we care about performance, we could memset only the 26 or 32 bytes we care about in `used`, leaving the others uninitialized since we'll never read them. For strings of lengths not much more than 26, on CPUs without SIMD for stores wider than an integer register (e.g. RV32gc), this could be a significant optimization. Your strategy of not branching on alphabetic characters before storing inside the loop is probably good, though, especially for strings that are much longer than minimal pangrams. (To really go fast, this might be a use-case for SSE4.2 to check 16 bytes of input at once.) — Peter Cordes, Jun 05 '23 at 21:12

Giorgos Xou · Accepted Answer · 2021-07-24T13:02:38.170

0

Simple Solution?

Here a Simple solution, I made the guess that you probably just want to know if it is or it isn't a pangram and so i've changed your function to a boolean one:

#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool findpan(char arr[]) {
    int i,j;
    for (i = 'a'; i < 'z'; ++i) { // goes through the alphabet
        for (j = strlen(arr); j > 0; j--) // goes through the arr[] 
            if (tolower(arr[j]) == i) // checks if the letter exists
                break; // breaks the inner for-loop if letter found
          
        if (j == 0) // if letter not found
            return false;  
    }
    return true;
}

int main() {
    bool isPangram;
    char str[60];
    
    fgets(str, 60, stdin);
    isPangram = findpan(str);
    
    if (isPangram)
        printf("pangram");
    else
        printf("not a pangram");
    return 0;
}

Explanation

'a' to 'z' represent the range of the lowercase letters in Decimal numbers, in the ASCII table:

for (i = 'a'; i < 'z'; ++i)

tolower converts arr[j] character to lowercase and then compares it to i:

if (tolower(arr[j]) == i)

stdbool.h is introduced for the use of bool aka boolean:

#include <stdbool.h>

edited Jul 24 '21 at 13:02

answered Jun 30 '21 at 20:09

Giorgos Xou

1,461
1
13
32

Hi , Thank you so much for helping. I had one doubt , where is the header file stdbool.h used in the program. And dont we have to use if(isPanagram=true) ??? Thanks again. :)) – ACHAL KAMBOJ Jun 30 '21 at 20:28
@ACHALKAMBOJ i think i've made it more clear now under the Explenation section :D – Giorgos Xou Jun 30 '21 at 20:36
Thank you sooo muchh. – ACHAL KAMBOJ Jun 30 '21 at 20:42
You're welcome, i'm happy that you found your answer in my solution :)) – Giorgos Xou Jun 30 '21 at 20:45
2

Please try to avoid [magic numbers](https://en.wikipedia.org/wiki/Magic_number_(programming)). – Some programmer dude Jul 01 '21 at 07:33
@Someprogrammerdude do you mean **97** and **122**? *(I've got an explanation section for them [but thx for mentioning the term, i've had no idea of it])* should I've had used something along the lines of ```#define a 97``` and ```#define z 122``` and then [...]? *(or do i get something wrong?)* – Giorgos Xou Jul 01 '21 at 08:40
2

@GiorgosXou Since you limit yourself to lower-case ASCII, why not use the characters themselves, as in `for (i = 'a'; i < 'z'; ++i)`? – Some programmer dude Jul 01 '21 at 09:05
@Someprogrammerdude **yeah sure, why not?** In my hurry to answer i've totally forgot about it, and so I had thought it would had been more **insightful** and clear like that, but of course **you have a point there. I'm about to change it right now, thanks** – Giorgos Xou Jul 01 '21 at 10:14

score 0 · Answer 4 · answered Jul 01 '21 at 09:16

0

Limiting myself to plain ASCII you can create a simple array, with one element per letter and each element initialized to zero. Then loop over the string, and for each letter convert it to an index into the array and increase the corresponding elements value.

Once done with the input string, you loop over the array, and increase a counter for every non-zero value, and return that.

Perhaps something like this:

#include <stdio.h>
#include <ctype.h>

int main(void)
{
    char input[512];

    if (!fgets(input, sizeof input, stdin))
        return 1;  // Failed to read input

    int letters[26] = { 0 };  // 26 letters in the English alphabet

    for (unsigned i = 0; input[i] != '\0'; ++i)
    {
        if (isalpha(input[i]))
        {
            // Limiting myself to e.g. plain ASCII here
            ++letters[tolower(input[i]) - 'a'];
        }
    }

    // Count the number of non-zero elements in the letters array
    unsigned counter = 0;
    for (unsigned i = 0; i < 26; ++i)
    {
        counter += letters[i] != 0;
    }

    // Print result
    printf("Counter = %d\n", counter);
}

With your example input (The quick brown fox jumps over the lazy dog) it outputs

Counter = 26

This does only a single pass over the input string, and then a single pass over the letters array. No nested loop, no multiple passes over the input string.

answered Jul 01 '21 at 09:16

Some programmer dude

400,186
35
402
621

if you go for cpu optimization then why you don't just ```if(letters[tolower(input[i]) - 'a'] == 0){counter ++; ++letters[tolower(input[i]) - 'a'];}``` – Giorgos Xou Jul 01 '21 at 10:49
1

@GiorgosXou I know, but doing something like `Counter += ++letters[tolower(input[i]) - 'a'] == 1;` is taking it to a level even I'm not comfortable with. – Some programmer dude Jul 01 '21 at 11:30
1

@GiorgosXou: Branching on the count for a letter becoming non-zero for every character of the input string is more work than scanning the array, at least for large strings. A string shorter than 26 characters can't be an English / Latin-alphabet pangram. You do only need `bool` status values, though, since you're right that only zero vs. non-zero matters. And its cheaper to do a pure store of a bool than to load+increment+store a value. Clang even auto-vectorizes the checking loop over bools: https://godbolt.org/z/jfY73dhjf (based on Passionate SE's version) – Peter Cordes Jun 05 '23 at 05:02
1

@GiorgosXou: For long strings, there might be some merit to checking as you go if that allows an early-out, e.g. seeing the whole alphabet after the first 40 chars of a 1024 byte string. If you were optimizing for a use-case where long inputs happened sometimes, you might check after every 64 bytes. (Since that only costs a handful of instructions if clang vectorizes with SIMD. If not, then call `memchr` to get a similar fast scan, otherwise your idea for accumulating a count on the fly is probably the best for an early-out.) – Peter Cordes Jun 05 '23 at 06:55
@PeterCordes I didn't know about **SIMD** instructions before, **that's so Interesting!** Thanks for mentioning it. It took me quite a while to figure it out *(because I was trying to prove you wrong, lol)* but eventually I got it. – Giorgos Xou Jun 05 '23 at 10:13

Son of Man · Answer 5 · 2023-06-05T04:56:11.850

0

To check if a string is a Pangram ie whether it contains all the letters in the English alphabet then you can check the steps below on how you can achieve that in C.

Initialize an array of Booleans of size 26 for all the characters in the English alphabet to false.
Loop through the characters in your input string and set the index corresponding the index of that character to true in the array we declared above.
Loop through the modified Boolean array above and check if any value is false. If a value at any index is false then the input string is not a pangram
Return a true Boolean value outside the loop above, which is a statement that will be reached if all the indices in the array have a true value which means the input string is a pangram.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>

//implement pangram function
bool isPangram(char * chr){
    /*1. declare an array of booleans equal to the size of the English alphabet 
      2. each index represents a corresponding character in the English alphabet
      3. When the boolean value in an index is set to true then that means, the character for that index in the alphabet exists in the input string 
    */
    bool alphas[26] = {false};
    //loop through characters in that string and update its value in the array
    for(int i=0;chr[i]!='\0';i++){
        //get the char at this index
        char c = chr[i];
        //check if its in the alphabet
        if(c>='a' && c<='z'){
            //get the index of that letter 
            int index = c-'a';
            //set that index to true in our array
            alphas[index] = true;
          //process capital letters
        }else if(c>='A' && c<='Z'){
            int index = c-'A';
            //set that index to true
            alphas[index] = true;   
        }
    }
    //loop through the boolean array and check for false(es) if any
    for(int i=0;i<26;i++){
        if(alphas[i]== false){
            return false;
        }
    }
    return true;
}
int main(int argc, char *argv[])
{
    //initialize an input string 
    char buff[1000];
    //read input from the user
    fgets(buff, sizeof(buff),stdin);
    //check if string is pangram
    if(isPangram(buff)){
        printf("%s %s",buff, " is a pangram");
    }else{
            printf("%s %s",buff, " is not a pangram");
    }
}

The most common input string you can use to check if the input string is a pangram is The quick brown fox jumps over the lazy dog.

NB:My code is limited to ASCII characters and won't work on a system using a different system of characters.

edited Jun 05 '23 at 04:56

answered May 31 '23 at 04:23

Son of Man

1,213
2
7
27

1

For ASCII, you can do `unsigned idx = (c|0x20u) - 'a';` `if (idx<=25u) alphas[idx] = true;`. ([What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?](https://stackoverflow.com/a/54585515)) For non-ASCII, it's not even guaranteed that the codes for 'a'-'z' are contiguous, and IIRC they're not for EBCDIC. – Peter Cordes Jun 05 '23 at 03:50
@PeterCordes, Yes I learnt that you can subtract alphabet characters in C to get an index which is the position of the character in the alphabet, this feature of the C programming language is really fun to learn and use. – Son of Man Jun 05 '23 at 03:58
Maybe you missed the point of my linked answer: ORing with `0x20` means you don't have to check separately for upper and lower case characters. Also, that unsigned wrapping to a high number means `idx <= 'z'-'a'` will be false for non-alphabetic characters. https://godbolt.org/z/jfY73dhjf shows how compact and efficient the code becomes, and doesn't repeat the if body twice. I hoped that compilers would optimize `c |= 0x20;` `if (c >= 'a' && c <= 'z')` since they can normally optimize range-checks, but they don't use the same subtraction they already need for `idx`, so that gets done twice. – Peter Cordes Jun 05 '23 at 04:39
@PeterCordes, yes that could be effective for the o(n), I will try it out and update the answer. What about users who don't like to deal with ASCII characters? – Son of Man Jun 05 '23 at 04:43
Also no, subtracting characters in C to get an index is *not* guaranteed to be portable. Like I said, it only works for ASCII. It fails in practice for EBCDIC. Most modern code should really be written for UTF-8, though, so you can only use this trick if you've checked that your string is purely the ASCII subset of UTF-8, that there are no multibyte characters. – Peter Cordes Jun 05 '23 at 04:44
Are you talking about O(n) complexity class? Both ways are linear with string length, but doing it more efficiently (especially with less branching) reduces the "constant" factor that's not part of complexity class analysis. Also, little-o(n) means complexity is strictly greater than linear. big-O(n) means it's linear or less. (This algorithm is big-Omega(n), exactly linear growth for huge problem sizes.) – Peter Cordes Jun 05 '23 at 04:47
@PeterCordes, but the code you suggested will work for all of them, ASCII and EBCDIC included? – Son of Man Jun 05 '23 at 04:47
No, neither does yours. My change doesn't introduce any new failures, AFAIK, and in fact makes sure that even if you get wrong answers, you don't write outside the array. https://www.cs.umd.edu/~meesh/cmsc311/clin-cmsc311/Lectures/lecture6/ebcdic.html shows the EBCDIC table- the alphabetic characters are monotonically increasing, but not contiguous. So `c= 'z'` gives `idx = 40`. And the lower-case bit is 0x40 instead of 0x20 for ASCII. So I guess my version could fail for a hypothetical character set where a-z are contiguous but flipping bit 0x20 doesn't toggle case. – Peter Cordes Jun 05 '23 at 04:54
Anyway, if we limit to plain ASCII, which is normally fine for toy problems, or for text that's already been checked for having no multi-byte UTF-8 characters, then my version is more efficient. – Peter Cordes Jun 05 '23 at 04:55
@PeterCordes, yes I have edited the answer to limit it to only ASCII characters which will work for both your version and mine – Son of Man Jun 05 '23 at 04:57

Pangram in C using functions

5 Answers5

Simple Solution?

Explanation

Linked