53

I'm trying to check if a character belongs to a list/array of invalid characters.

Coming from a Python background, I used to be able to just say :

for c in string:
    if c in invalid_characters:
        #do stuff, etc

How can I do this with regular C char arrays?

Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264
Amarok
  • 900
  • 2
  • 7
  • 15

7 Answers7

52

The less well-known but extremely useful (and standard since C89 — meaning 'forever') functions in the C library provide the information in a single call. Actually, there are multiple functions — an embarrassment of riches. The relevant ones for this are:

7.21.5.3 The strcspn function

Synopsis

#include <string.h>
size_t strcspn(const char *s1, const char *s2);

Description

The strcspn function computes the length of the maximum initial segment of the string pointed to by s1 which consists entirely of characters not from the string pointed to by s2.

Returns

The strcspn function returns the length of the segment.

7.21.5.4 The strpbrk function

Synopsis

#include <string.h>
char *strpbrk(const char *s1, const char *s2);

Description

The strpbrk function locates the first occurrence in the string pointed to by s1 of any character from the string pointed to by s2.

Returns

The strpbrk function returns a pointer to the character, or a null pointer if no character from s2 occurs in s1.

The question asks about 'for each char in string ... if it is in list of invalid chars'.

With these functions, you can write:

size_t len = strlen(test);
size_t spn = strcspn(test, "invald");

if (spn != len) { ...there's a problem... }

Or:

if (strpbrk(test, "invald") != 0) { ...there's a problem... }

Which is better depends on what else you want to do. There is also the related strspn() function which is sometimes useful (whitelist instead of blacklist).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 7
    amazing how many people suggested `strchr`! `strpbrk` is clearly the ideal solution. – Evan Teran May 25 '10 at 00:37
  • 2
    This should actually be the accepted answer! C is not entirely without batteries... – Aktau Oct 03 '14 at 09:27
  • In your description of strpbrk `any character from the string pointed to by s2`, but in your example you test against `"invalid"`. Won't this look for any occurrence of any of the individual characters in the string `"invalid"`? – Isaac Baker Sep 23 '15 at 17:27
  • @IsaacBaker: I'm not sure what you're asking. To my mind, your "won't this" question is asking whether the function will behave as documented (the answer is yes), but I'm sure you must have had a reason for asking, so I must be misunderstanding what you are asking. – Jonathan Leffler Sep 23 '15 at 17:31
  • @JonathanLeffler: What I was trying to ask/say is this: Your response seems somewhat conflicting or confusing. In your description of `strpbrk` you have the correct definition, however, in your example, `if (strpbrk(test, "invalid") != 0) { ...there's a problem... }`, it appears your looking for a contiguous string of characters, `"invalid"`, but will instead search for the first occurrence of **any** character contained in `"invalid"`. Is this true and correct? – Isaac Baker Oct 20 '15 at 15:34
  • @IsaacBaker: You might notice that the string I use is `"invald"` and not `"invalid"`. I am not looking for the word 'invalid' (though if the 'haystack' to be searched contained "invalid", then `strpbrk(haystack, "invald")` would find the first letter from the character class `[adilnv]` in the haystack, which might be the 'i' at the start of "invalid"). So, given that I didn't use "invalid" and did use "invald", I'm not sure what's confusing. (I note that `strpbrk(haystack, "the quick brown fox jumped over the lazy dog")` is a way of finding an alphabetic character or space in the haystack. – Jonathan Leffler Oct 20 '15 at 17:05
  • "an embarrassment of riches" is probably the greatest thing I've read on the internet in 10 years. – NorseGaud May 12 '22 at 00:57
33

The equivalent C code looks like this:

#include <stdio.h>
#include <string.h>

// This code outputs: h is in "This is my test string"
int main(int argc, char* argv[])
{
   const char *invalid_characters = "hz";
   char *mystring = "This is my test string";
   char *c = mystring;
   while (*c)
   {
       if (strchr(invalid_characters, *c))
       {
          printf("%c is in \"%s\"\n", *c, mystring);
       }

       c++;
   }

   return 0;
}

Note that invalid_characters is a C string, ie. a null-terminated char array.

RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • 4
    Trying not to be nit picky, but if its in C, shouldn't you replace std::cout with the equivalent printf() call? Or at least something that exists in C? – DeadHead Jul 01 '09 at 22:07
  • 2
    Although this loop using strchr() works, I think it is better to use 'strcspn()' or `strpbrk()` without the loop in the user code. – Jonathan Leffler Jul 01 '09 at 23:29
  • @Jonathan: You're right, but I've kept the code similar to the OP's original Python, and answered the question "check if a char exists in a char array". – RichieHindle Jul 01 '09 at 23:40
  • Wouldn't it be better in this situation to loop through the invalid characters and check if they exist in mystring (with strchr), rather than checking each character in mystring to see if it is an invalid character? – mk12 Jul 15 '10 at 17:18
  • @mk12: It depends. Suppose you have N characters in the haystack being searched, and M characters in the needle which you're looking for. If all the characters in the haystack are valid, you end up with NxM comparisons whichever way round you do it. Normally, M will be much smaller than N. If the first character of the haystack is the last invalid character in the needle, your search would do N*(M-1) comparisons, whereas the alternative would do just M comparisons. If the first character in the haystack is the first invalid character in the needle, both systems stop after 1 comparison. – Jonathan Leffler Mar 07 '17 at 18:43
  • @mk12: Yes, there is also a scenario where your proposal outperforms the alternative. The last character in the haystack is the first invalid character in the needle (everything else is OK). Then your proposal does N comparisons, and the alternative does (N-1)xM+1 comparisons. – Jonathan Leffler Mar 07 '17 at 18:45
29

Assuming your input is a standard null-terminated C string, you want to use strchr:

#include <string.h>

char* foo = "abcdefghijkl";
if (strchr(foo, 'a') != NULL)
{
  // do stuff
}

If on the other hand your array is not null-terminated (i.e. just raw data), you'll need to use memchr and provide a size:

#include <string.h>

char foo[] = { 'a', 'b', 'c', 'd', 'e' }; // note last element isn't '\0'
if (memchr(foo, 'a', sizeof(foo)))
{
  // do stuff
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
DaveR
  • 9,540
  • 3
  • 39
  • 58
  • I am amazed at the score of this incomplete answer that does answer the question title but falls short of addressing the OP goal as expressed in Python. For once, there is a perfect solution in C that is more concise than the Python code: `if (strpbrk(string, invalid_characters)) { /* do stuff, etc. */ }` – chqrlie Aug 10 '23 at 17:32
5

use strchr function when dealing with C strings.

const char * strchr ( const char * str, int character );

Here is an example of what you want to do.

/* strchr example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char invalids[] = ".@<>#";
  char * pch;
  pch=strchr(invalids,'s');//is s an invalid character?
  if (pch!=NULL)
  {
    printf ("Invalid character");
  }
  else 
  {
     printf("Valid character");
  } 
  return 0;
}

Use memchr when dealing with memory blocks (as not null terminated arrays)

const void * memchr ( const void * ptr, int value, size_t num );

/* memchr example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char * pch;
  char invalids[] = "@<>#";
  pch = (char*) memchr (invalids, 'p', strlen(invalids));
  if (pch!=NULL)
    printf (p is an invalid character);
  else
    printf ("p valid character.\n");
  return 0;
}

http://www.cplusplus.com/reference/clibrary/cstring/memchr/

http://www.cplusplus.com/reference/clibrary/cstring/strchr/

Tom
  • 43,810
  • 29
  • 138
  • 169
4

You want

strchr (const char *s, int c)

If the character c is in the string s it returns a pointer to the location in s. Otherwise it returns NULL. So just use your list of invalid characters as the string.

Keith Smith
  • 3,611
  • 3
  • 19
  • 12
2

strchr for searching a char from start (strrchr from the end):

  char str[] = "This is a sample string";

  if (strchr(str, 'h') != NULL) {
      /* h is in str */
  }
dfa
  • 114,442
  • 31
  • 189
  • 228
1

I believe the original question said:

a character belongs to a list/array of invalid characters

and not:

belongs to a null-terminated string

which, if it did, then strchr would indeed be the most suitable answer. If, however, there is no null termination to an array of chars or if the chars are in a list structure, then you will need to either create a null-terminated string and use strchr or manually iterate over the elements in the collection, checking each in turn. If the collection is small, then a linear search will be fine. A large collection may need a more suitable structure to improve the search times - a sorted array or a balanced binary tree for example.

Pick whatever works best for you situation.

Skizz
  • 69,698
  • 10
  • 71
  • 108