2

Trying to tokenise using strtok the input file is

InputVector:0(0,3,4,2,40)

Trying to get the numbers in but I encountered something unexpected that I don't understand, my tokenising code looks like this.

    #define INV_DELIM1 ":"
    #define INV_DELIM2 "("
    #define INV_DELIM3 ",)"

    checkBuff = fgets(buff, sizeof(buff), (FILE*)file);

    if(checkBuff == NULL)
    {
        printf("fgets failure\n");
        return FALSE;
    }
    else if(buff[strlen(buff) - 1] != '\n')
    {
        printf("InputVector String too big or didn't end with a new line\n");
        return FALSE;
    }
    else 
    {
        buff[strlen(buff) - 1] = '\0';
    }

    token = strtok(buff, INV_DELIM1);
    printf("token %s", token);
    token = strtok(buff, INV_DELIM2);
    printf("token %s", token);

    while(token != NULL) {
            token = strtok(NULL, INV_DELIM3);
            printf("token %s\n", token);
            if(token != NULL) {
                number = strtol(token, &endptr, 10);
                if((token == endptr || *endptr != '\0')) {
                    printf("A token is Not a number\n");
                    return FALSE;
                }
                else {
                    vector[i] = number;
                    i++;
                }
            }
        }

output:

token InputVector
token 0
token 0
token 3
token 4
token 2
token 40
token

So the code first calls fgets and checks if it's not bigger than the length of my buffer if it isn't it replaces the last character with '\0'.

Then I tokenise the first word, and the number outside of the brackets. the while loop tokenises the numbers inside the brackets and change them using strtol and put it inside of an array. I'm trying to use strtol to detect if the data type inside of the brackets is numerical but it always detects error because strtok reads that last token which isn't in the input. How do i get rid of that last token from being read so that my strtol doesn't pick it up? Or is there a better way I can tokenise and check the values inside the brackets?

The input file will later on contain more than one input vectors and I have to be able to check if they're valid or not.

alk
  • 69,737
  • 10
  • 105
  • 255
alexW
  • 183
  • 3
  • 13
  • Are you using Windows? How did you open `file`? – rici Oct 14 '18 at 03:02
  • @rici i'm using windows but i run it in putty. i use fopen with r tag, then i use while loop with !feof(file) as the condition then i use fgets() to get a line, Sorry if it's unclear this is only answer i can give without code... – alexW Oct 14 '18 at 03:17
  • 1
    The most likely thing is that your input line ends with a Windows newline sequence (`\r\n`). If your orogram runs on unix and you are typing your input on Windows, the program won't know that it needs to do line-end translation. Try checking to see if the second-last character is `\r` after you check that the last character is `\n`. – rici Oct 14 '18 at 03:21
  • Also, after you find the colon (which you could just as easily do with `strchr`), you can tokenise the inout with repeated calls to `strtol`, since that function tells you where the number ended. That's more precise since it lets you know which delimiter follows each number. – rici Oct 14 '18 at 03:23
  • Finally, please please read this: https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong – rici Oct 14 '18 at 03:26
  • @rici Yes you are right there's a \r\n. This is the first time i've heard of \r, so should I change that value to \0 before checking if there's a buffer overflow? Or is there a better way? – alexW Oct 14 '18 at 03:32
  • @rici aren't I using strtol already in my code? I don't understand what you're trying to say. – alexW Oct 14 '18 at 03:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181825/discussion-between-alexw-and-rici). – alexW Oct 14 '18 at 03:45
  • Added an answer with sample code, untested. – rici Oct 14 '18 at 04:13
  • In the fragment `token = strtok(buff, INV_DELIM1); printf("token %s", token); token = strtok(buff, INV_DELIM2); printf("token %s", token);` you should be getting the same token twice (`InputVector`, if I understand the input information). You'd need to supply `NULL` instead of `buff` in the second call to `strtok()` to skip the `0` before the `(`. – Jonathan Leffler Oct 14 '18 at 06:23

2 Answers2

2

The most likely explanation is that your input line ends with the Windows newline sequence \r\n. If your program runs on unix (or linux) and you are typing your input on Windows, Windows will send the two-character newline sequence but the Unix program won't know that it needs to do line-end translation. (If you ran the program diretly on the Windows system, the standard I/O library would deal with the newline sequence for you, by translating it to a single \n, as long as you don't open the file in binary mode.)

Since \r is not in your delimiter list, strtok will treat it as an ordinary character, so your last field will consist of the \r. Printing it out is not quite a no-op, but it's invisible, so it's easy to get fooled into thinking that an empty field is being printed. (The same would happen if the field consisted only of spaces.)

You could just add \r to your delimiter list. Indeed, you could add both \n and \r to the delimiter list in your strtok call, and then you wouldn't need to worry about trimming the input line. That will work because strtok treats any sequence of delimiter characters as a single delimiter.

However, that may not really be what you want, since that will hide certain input errors. For example, if the input had two consecutive commas, strtok would treat them as a single comma, and you would never know that the field was skipped. You could solve that particular problem by using strspn instead of strtok, but I personally think the better solution is to not use strtok at all since strtol will tell you where the line ends.

eg. (For simplicity, I left out printing of error messages. It's not necessary to check whether the line ends with a newline before this code; if you feel it necessary to do that check, you can do it after you find the close parenthesis at the end of the loop.):

#include <ctype.h>     /* For 'isspace' */
#include <stdbool.h>   /* For 'false'   */
#include <stdlib.h>    /* For 'strtol'  */
#include <string.h>    /* For 'strchr'  */

// ...

char* token = strchr(buff, ':');          /* Find the colon */
if (token == NULL) return false;          /* No colon */
++token;                                  /* Character after the token */
char* endptr;
(void)strtol(token, &endptr, 10);         /* Read and toss away a number */
if (endptr == token) return false;        /* No number */
token = endptr;                           /* Character following number */
while (isspace(*token)) ++token;          /* Skip spaces (maybe not necessary) */
if (*token != '(') return false;          /* Wrong delimiter */
for (i = 0; i < n_vector; ++i) {          /* Loop until vector is full or ')' is found */
  ++token;
  vector[i] = strtol(token, &endptr, 10); /* Get another number */
  if (endptr == token) return false;      /* No number */
  token = endptr;                         /* Character following number */
  while (isspace(*token)) ++token;        /* Skip spaces */
  if (*token == ')') break;               /* Found the close parenthesis */
  if (*token != ',') return false;        /* Not the right delimiter */
}                                         /* Loop */
/* At this point, either we found the ')' or we read too many numbers */
if (*token != ')') return false;          /* Too many numbers */
/* Could check to make sure the following characters are a newline sequence */
/* ... */

The code which calls strtol to get a number and then check what the delimiter is should be refactored, but I wrote it out like that for simplicity. I would normally use a function which reads a number and returns the delimiter (as with getchar()) or EOF if the end of the buffer is encountered. But it would depend on your precise needs.

rici
  • 234,347
  • 28
  • 237
  • 341
0

When you use the function strtok() firt you are spliting a string in delimitier ":" e after "(". For example the sentence

 InputVector:0(0,3,4,2,40)

When you apply strtok(buffer,":") you get the only the first result InputVector. You have to apply again strtok(NULL,":") to get the rest of the split 0(0,3,4,2,40). You can't apply a different delimitier to the same buffer, or apply strtok again in the same buff because the C split put a NULL on the end of each token and you will or lose the refference, or apply strtok just int the first part of the string. The best way to split this sentence is with all delimitier :(),, that will split all sentence like this:

InputVector
0
0
3
4
2
40

The changes that you need to do is

#define INV_DELIM1 ":(),\n"
token = strtok(buff,INV_DELIM1); //for the first call of strtok
token = strtok(NULL,INV_DELIM1); //for the rest of strtok call
tiagohbalves
  • 174
  • 1
  • 8
  • I tried ur solution using the same delimiters but i doesn't work, I still get the empty token. I think that empty token is there after I do my buffer checking since I add a '/0' character at the end if the size not too big. Is there a way for my program to break out of the loop if the token is empty (before null)? – alexW Oct 14 '18 at 02:49
  • 1
    You can certainly change to a different delimiter. There is no need to use the same delimiter string on every call. – rici Oct 14 '18 at 03:00
  • Whats is you complete code? You change all strtok `token = strtok(NULL,INV_DELIM1);` ? To me . [https://gist.github.com/tiagohbalves/e88eaad9e9fda48990db75a2083c698e] – tiagohbalves Oct 14 '18 at 03:43