-1

I have a piece of code that loops through a char array string to try and detect words. It loops through and if the detects A - Z or a - z or an _ (underscore) it will add it to a char array. What I need, because they're words, is to be able to put them into a string which I can then use another function to check and then can be discarded. This is my function:

char wholeProgramStr2[20000];
char wordToCheck[100] ="";

IdentiferFinder(char *tmp){
    //find the identifiers
    int count = 0;
    int i;
    for (i = 0; i < strlen(tmp); ++i){
        Ascii = toascii(tmp[i]);
        if ((Ascii >= 65 && Ascii <= 90) || (Ascii >= 97 && Ascii <= 122) || (Ascii == 95))
        {
            wordToCheck[i] = tmp[i];
            count++;
            printf("%c",wordToCheck[i]); 
        }
        else {
            if (count != 0){
            printf("\n");
        }
            count = 0;
        }
    }
    printf("\n");
}

At the moment I can see all of the words because it prints them out on separate lines.

the content of WholeProgram2 is whatever all the lines are of the file. and it is the *tmp argument.

Thank you.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
DonnellyOverflow
  • 3,981
  • 6
  • 27
  • 39
  • 4
    Never compare against magic numbers. Use `isalpha()` and `'-'`. – unwind Nov 12 '14 at 14:31
  • Your question is similar to [this one](http://stackoverflow.com/q/26869798/841108), which is duplicate of [that](http://stackoverflow.com/questions/308695/how-to-concatenate-const-literal-strings-in-c). Adapt [my answer](http://stackoverflow.com/a/26869883/841108) to your needs here. – Basile Starynkevitch Nov 12 '14 at 14:33
  • You currently fill the `wordToCheck` array already (but remove the `=""` initializer, global variables are `zero initialized` thus you will get a NUL terminated string in the array, given that `tmp` is not too long.) It's a global variable, you can access it from other functions. Please clarify what you want or what the problem is. – Jite Nov 12 '14 at 14:34
  • What the hell is `Ascii = toascii(tmp[i]);`? Where is the `Ascii` defined? What's its type? – EOF Nov 12 '14 at 14:35
  • Can you explain what delimiters you are using to parse words from `wholeProgramStr2`. usually, spaces, tabs etc. are used as delimiters for this type of parsing. Is this what you are doing? – ryyker Nov 12 '14 at 14:42
  • @EOF Ascii is an integer and tmp is a pointer to wholeprogramstring2. tmp is greater than 100. I'll change it. – DonnellyOverflow Nov 12 '14 at 15:20
  • I can't edit the code from my phone but, basically, from a huge string I want to remove characters in order and place them into a new char array starting from the 0 index. Then I want to take the word out of the array as a string so it can be used as a string to pass it to another function. – DonnellyOverflow Nov 12 '14 at 15:24
  • I would strongly recommend not exercising the `implicit int` "feature" of older C standards. It's been removed from the newer standards for good reason. – EOF Nov 12 '14 at 17:06
  • @eof what is the standard. Please explain. – DonnellyOverflow Nov 12 '14 at 19:31
  • @JamesDonnelly - Some C implementations support _implicit_ typing of variables. That is, if you use a variable without declaring it, such as `Ascii`, it will be _implicitly_ typed by the compiler (at compile time) to be an `int` type. EOF is simply suggesting it is better to be explicit, and declare the variable, as you have done for the others, eg: `int Ascii = 0;`. – ryyker Nov 12 '14 at 19:37

2 Answers2

3

You describe breaking apart a big string, into little strings (words).
Assuming you are using normal delimiters to parse, such as spaces or tabs or newlines:

Here is a three step approach:
First, get information about your source string.
Second, create your target array dynamically to fit your size needs
Third, loop on strtok() to populate your target array of strings (char **)

(A forth would be to free memory created, which you will need to do)
hint: the prototype could look like this:
// void Free2DCharArray(char **a, int numWords);

Code example:

void FindWords(char **words, char *source);
void GetStringParams(char *source, int *longest, int *wordCount);
char ** Create2DCharArray(char **a, int numWords, int maxWordLen);
#define DELIM " \n\t"

int main(void)
{
    int longestWord = 0, WordCount = 0;
    char **words={0};
    char string[]="this is a bunch of test words";

    //Get number of words, and longest word, use in allocating memory
    GetStringParams(string, &longestWord, &WordCount);

    //create array of strings with information from source string
    words = Create2DCharArray(words, WordCount, longestWord);

    //populate array of strings with words
    FindWords(words, string);

    //Do not forget to free words (left for you to do)
    return 0;   
}

void GetStringParams(char *source, int *longest, int *wordCount)
{
    char *tok;
    int i=-1, Len = 0, KeepLen = 0;
    char *cpyString = 0;
    cpyString = calloc(strlen(source)+1, 1);
    strcpy(cpyString, source);
    tok=strtok(source, DELIM);
    while(tok)
    {
        (*wordCount)++;
        Len = strlen(tok);
        if(Len > KeepLen) KeepLen = Len;
        tok = strtok(NULL, DELIM);
    }
    *longest = KeepLen;
    strcpy(source, cpyString);//restore contents of source
}

void FindWords(char **words, char *source)             
{
    char *tok;
    int i=-1;

    tok = strtok(source, DELIM);
    while(tok)
    {
        strcpy(words[++i], tok);
        tok = strtok(NULL, DELIM);
    }
}

char ** Create2DCharArray(char **a, int numWords, int maxWordLen)
{
    int i;
    a = calloc(numWords, sizeof(char *));
    if(!a) return a;
    for(i=0;i<numWords;i++)
    {
        a[i] = calloc(maxWordLen + 1, 1);       
    }
    return a;
}
ryyker
  • 22,849
  • 3
  • 43
  • 87
2

If your goal is to look for words in an array of chars, you probably want to first find a valid sequence of character (and you seem to be trying to do that), and once you've found one, do that secondary check to know if it is a real word. If it is indeed a word, you may then decide to keep it for further usage.

The advantage of this approach is that you don't need to keep a large buffer of potential words, you only need a fixed one, of size matching the largest word in your dictionary. In fact, you might not even need a buffer, but just a pointer sliding along the char array, pointing at the start of a possible word, and an int (though a byte might suffice) to keep track of the length of that word.

// structure to store a word match in array
typedef struct token_s {
  int length;
  const char *data;
} token_t;

void nextToken(const char *tmp, int len, token_t *to){
  char *start = NULL;
  while (len){
    if (start) {
      // search for end of current word
      if (!isalpha(*tmp)) {
        to->data = start;
        to->length = tmp - start;
        return;
      }
    } else { 
      // search for beginning of next word
      if (isalpha(*tmp))
        start = tmp;
    }
    tmp++;
    len--;
  } // while
  if (start) {
    to->data = start;
    to->length = tmp - start;  
  }
}

Simply pass:

  • the start of your char array, or to->data + to->length + 1 if it's not beyond the end of the array
  • the raining length of the char array to scan
  • a pointer to a zeroed token_t

to each call to nextToken, and check the token's content to know if it found a candidate; if it didn't, you know that the array has been scanned entirely.

void scanArray(const char *tmp, int len){
  while (len > 0){
    token_t to;
    to.data = NULL;
    to.length =0;
    nextToken(tmp, len, &to);
    if (to.data) {
      tmp += to.length +1;
      len -= to.length +1;     
      // process token here...
    } else break;
  } // while
}

I used isalpha to test for valid characters, but you'll want to replace that by a function of your own. And you'll have to insert your own code for that secondary checking in the body of scanArray.

didierc
  • 14,572
  • 3
  • 32
  • 52