-3

I am trying to split a string on every occurrence of a closing bracket and send it to a character array as a line by line in a while loop. This is the input I am reading in a char * input

(000,P,ray            ),(100,D,ray            ),(009,L,art            ),(0000,C,max            ),(0000,S,ben            ),(020,P,kay            ),(040,L,photography    ),(001,C,max            ),(0001,S,ben            ),(0001,P,kay            )

This is the output I am trying to produce in a char each[30] = {}

(000,P,ray            ),
(100,D,ray            ),
(009,L,art            ),
(000,C,max            ),
(000,S,ben            ),
(020,P,kay            ),
(040,L,photography    ),
(001,C,max            ),
(201,S,ben            ),
(301,P,kay            )

I copied the input to a char * temp so that strtok() does not change the input. But I am not understanding how to use strtok() inside the while loop condition. Does anyone know how to do it ?

Thanks,

UPDATE: Sorry if I have violated the rules. Here's my code -

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>


int main(int argc, char *argv[]){
  size_t len = 0;
  ssize_t read;
  char * line = NULL;
  char *eacharray;
  FILE *fp = fopen(argv[1], "r");
  char * each = NULL;
  while ((read = getline(&line, &len, fp)) != -1) {
    printf("%s\n", line);
    eacharray = strtok(line, ")");
//      printf("%s +\n", eacharray);
    while(eacharray != NULL){
        printf("%s\n", eacharray);
        eacharray = strtok(NULL, ")");
    }
  }
  return 0;
}

It produces an output like this -

(000,P,ray            
,(100,D,ray            
,(009,L,art            
,(0000,C,max            
,(0000,S,ben            
,(020,P,kay            
,(040,L,photography    
,(001,C,max            
,(0001,S,ben            
,(0001,P,kay
Kachu
  • 21
  • 4

2 Answers2

1

I would not use strtok, because your simple parser should first detect an opening brace and then search for a closing one. With strtok, you could just split at a closing brace; then you could not check if there was an opening one, and you'd have to skip the characters until the next opening brace "manually".

BTW: you probably meant each[10][30], not each[30].

See the following code looking for opening and closing braces and copying the content in between (including the braces):

int main(int argc, char *argv[]) {

    const char* source ="(000,P,ray            ),"
                        "(100,D,ray            ),"
                        "(009,L,art            ),"
                        "(0000,C,max            ),"
                        "(0000,S,ben            ),"
                        "(020,P,kay            ),"
                        "(040,L,photography    ),"
                        "(001,C,max            ),"
                        "(0001,S,ben            ),"
                        "(0001,P,kay            )";

    char each[10][30] = {{ 0 }};
    const char *str = source;
    int i;
    for (i=0; i<10; i++) {
        const char* begin = strchr(str, '(');
        if (!begin)
            break;

        const char* end = strchr(begin,')');
        if  (!end)
            break;

        end++;
        ptrdiff_t length = end - begin;
        if (length >= 30)
            break;

        memcpy(each[i], begin, length);

        str = end;
    }

    for (int l=0; l<i; l++) {
        printf("%s", each[l]);
        if (l!=i-1)
            printf(",\n");
    }
    putchar ('\n');
}

Hope it helps.

Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
  • 1
    I took license to prevent it from wrapping `:)` You may also want to add `{{0}}` as the initializer for `each`, and a final `putchar ('\n');` to make the output POSIX compliant. (and not mess up your prompt `:)` – David C. Rankin Jan 18 '18 at 21:04
  • Thank you. As you can see in the updated question. I am trying to read it from a file. So I actually dont know the size of the array, which you have assumed as 10. – Kachu Jan 18 '18 at 21:12
  • Do you *really* know the contract for `strncpy()`? – Deduplicator Jan 18 '18 at 21:14
  • @Deduplicator: at least I thought that I know it :-) What makes you wondering? – Stephan Lechner Jan 18 '18 at 21:16
  • Well, it's for putting a string, **without its terminator**, into a fixed-size-buffer, **and 0-padding** it. If that's not what you want to do (it isn't!), use `memcpy` or `strcpy`, depending on whether you have the length. Also interesting is the fairly common `strlcpy()`. – Deduplicator Jan 18 '18 at 21:18
  • @David: thanks! Should turn on all warnings ;-) – Stephan Lechner Jan 18 '18 at 21:28
  • @Deduplicator: agree. thanks! – Stephan Lechner Jan 18 '18 at 21:35
  • @StephanLechner Thanks. I am trying to remove `constant` from it as I will be reading it from a file. The functions `getline()` or `fgets()` requires the pointer to be `char *` and not `const char *`. How can I go about here ? – Kachu Jan 19 '18 at 17:13
1

There are many ways to approach this problem. Stephan has a good approach using the functions available in string.h (and kindly contributed the example source string). Another basic way to approach this problem (or any string parsing problem) is to simply walk-a-pointer down the string, comparing characters as you go and taking the appropriate action.

When doing so with multiple-delimiters (e.g. ',' and (...), it is often helpful to indicate the "state" of your position within the original string. Here a simple flag in (for inside or outside (...)) well let you control whether you copy characters to your array or skip them.

The rest is just keeping track of your indexes and protecting your array bounds as you loop over each character (more of an accounting problem from a memory standpoint -- which you should do regardless)

Putting the pieces together, and providing additional details in comments in-line below, you could do something like the following:

#include <stdio.h>

#define MAXS 10     /* if you need constants -- declare them */
#define MAXL 30     /*  (don't use 'magic numbers' in code)  */

int main (void) {

    const char* source ="(000,P,ray            ),"
                        "(100,D,ray            ),"
                        "(009,L,art            ),"
                        "(0000,C,max            ),"
                        "(0000,S,ben            ),"
                        "(020,P,kay            ),"
                        "(040,L,photography    ),"
                        "(001,C,max            ),"
                        "(0001,S,ben            ),"
                        "(0001,P,kay            )";

    char each[MAXS][MAXL] = {{0}},
        *p = (char *)source;
    int i = 0, in = 0, ndx = 0; /* in - state flag, ndx - row index */

    while (ndx < MAXS && *p) {  /* loop over all chars filling 'each' */
        if (*p == '(') {       /* (while protecting your row bounds) */
            each[ndx][i++] = *p;    /* copy opening '(' */
            in = 1;                 /* set flag 'in'side record */
        }
        else if (*p == ')') {
            each[ndx][i++] = *p;    /* copy closing ')' */
            each[ndx++][i] = 0;     /* nul-terminate */
            i = in = 0;             /* reset 'i' and 'in' */
        }
        else if (in) {              /* if we are 'in', copy char */
            each[ndx][i++] = *p;
        }
        if (i + 1 == MAXL) {    /* protect column bounds */
            fprintf (stderr, "record exceeds %d chars.\n", MAXL);
            return 1;
        }
        p++;    /* increment pointer */
    }

    for (i = 0; i < ndx; i++)   /* display results */
        printf ("each[%2d] : %s\n", i, each[i]);

    return 0;
}

(note: above, each row in each will be nul-terminated by default as a result of initializing all characters in each to zero at declaration, but it is still good practice to affirmatively nul-terminate all strings)

Example Use/Output

$ ./bin/testparse
each[ 0] : (000,P,ray            )
each[ 1] : (100,D,ray            )
each[ 2] : (009,L,art            )
each[ 3] : (0000,C,max            )
each[ 4] : (0000,S,ben            )
each[ 5] : (020,P,kay            )
each[ 6] : (040,L,photography    )
each[ 7] : (001,C,max            )
each[ 8] : (0001,S,ben            )
each[ 9] : (0001,P,kay            )

Get comfortable using either method. You can experiment whether using if.. else if.. or a switch best fits any parsing problem. The functions in string.h can be the better choice. It all depends on your input. Being comfortable with both approaches helps you better tailor your code to the problem at hand.


Example with getline and realloc of Rows

Since you are using getline to read each line, it will potentially read and allocate storage for an unlimited number of records (e.g. (...)). The way to handle this is to allocate storage for your records (pointers) dynamically, keep track of the number of pointers used, and realloc to allocate more pointers when you reach your record limit. You will need to validate each allocation, and understand you allocate each as a pointer-to-pointer-to-char (e.g. char **each) instead of each being a 2D array (e.g. char each[rows][cols]). (though you will still access and use the string held with each the same way (e.g. each[0], each[1], ...))

The code below will read from the filename given as the first argument (or from stdin if no argument is given). The approach is a standard approach for handling this type problem. each is declared as char **each = NULL; (a pointer-to-pointer-to-char). You then allocate an initial number of pointers (rows) for each with:

    each = malloc (rows * sizeof *each);    /* allocate rows no. of pointers */
    if (!each) {                            /* validate allocation */
        perror ("each - memory exhausted"); /* throw error */
        return 1;
    }

You then use getline to read each line into a buffer (buf) and pass a pointer to buf to the logic we used above. (NOTE, you must preserve a pointer to buf as buf points to storage dynamically allocated by getline that you must free later.)

The only addition to the normal parsing logic is we now need to allocate storage for each of the records we parse, and assign the address of the block of memory holding each record to each[x]. (we use strcpy for that purpose after allocating the storage for each record).

To simplify parsing, we originally parse each record into a fixed size buffer (rec) since we do not know the length of each record ahead of time. You can dynamically allocate/reallocate for rec as well, but that adds an additional level of complexity -- and I suspect you will struggle with the additions as they stand now. Just understand we parse each record from buf into rec (which we set at 256 chars #define MAXR 256 -- more than ample for the expected 30-31 char record size) Even though we use a fixed length rec, we still check i against MAXR to protect the fixed array bounds.

The storage for each record and copy of parsed records from rec to each[ndx] is handled when a closing ) is encountered as follows:

(note - storage for the nul-character is included in 'i' where you would normally see 'i + 1')

                each[ndx] = malloc (i); /* allocate storage for rec */
                if (!each[ndx]) {       /* validate allocation */
                    perror ("each[ndx] - memory exhausted");
                    return 1;
                }
                strcpy (each[ndx], rec);/* copy rec to each[ndx] */

(note: by approaching allocation in this manner, you allocate the exact amount of storage you need for each record. There is no wasted space. You can handle 1 record or 10,000,000 records (to the extent of the memory on your computer))

Here is your example. Take time to understand what every line does and why. Ask questions if you do not understand. This is the meat-and-potatoes of dynamic allocation and once you get it -- you will have a firm understanding of the basics for handling any of your storage needs.

#include <stdio.h>
#include <stdlib.h>     /* for malloc, realloc */
#include <string.h>     /* for strcpy */

#define ROWS  10    /* initial number of rows to allocate  */
#define MAXR 256    /* maximum record length between (...) */

int main (int argc, char **argv) {

    int in = 0;                     /* in - state flag */
    char **each = NULL,             /* pointer to pointer to char */
        *buf = NULL;                /* buffer for getline */
    size_t rows = ROWS,             /* currently allocated row pointers */
        ndx = 0,                    /* ndx - row index */
        n = 0,                      /* buf size (0 - getline decides) */
        i = 0;                      /* loop counter */
    ssize_t nchr = 0;               /* num chars read by getline (return) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    each = malloc (rows * sizeof *each);    /* allocate rows no. of pointers */
    if (!each) {                            /* validate allocation */
        perror ("each - memory exhausted"); /* throw error */
        return 1;
    }

    while ((nchr = getline (&buf, &n, fp) != -1)) { /* read line into buf */
        char *p = buf,                  /* pointer to buf */
            rec[MAXR] = "";             /* temp buffer to hold record */
        while (*p) {  /* loop over all chars filling 'each' */
            if (*p == '(') {            /* (while protecting your row bounds) */
                rec[i++] = *p;          /* copy opening '(' */
                in = 1;                 /* set flag 'in'side record */
            }
            else if (*p == ')') {
                rec[i++] = *p;          /* copy closing ')' */
                rec[i++] = 0;           /* nul-terminate */
                each[ndx] = malloc (i); /* allocate storage for rec */
                if (!each[ndx]) {       /* validate allocation */
                    perror ("each[ndx] - memory exhausted");
                    return 1;
                }
                strcpy (each[ndx], rec);/* copy rec to each[ndx] */
                i = in = 0;             /* reset 'i' and 'in' */
                ndx++;                  /* increment row index */
                if (ndx == rows) {      /* check if rows limit reached */
                    /* reallocate 2X number of pointers using tmp pointer */
                    void *tmp = realloc (each, rows * sizeof *each * 2);
                    if (!tmp) { /* validate realloc succeeded */
                        perror ("realloc each - memory exhausted");
                        goto memfull;   /* each still contains original recs */
                    }
                    each = tmp;         /* assign reallocated block to each  */
                    rows *= 2;          /* update rows with current pointers */
                }
            }
            else if (in) {              /* if we are 'in', copy char */
                rec[i++] = *p;
            }
            if (i + 1 == MAXR) {        /* protect column bounds */
                fprintf (stderr, "record exceeds %d chars.\n", MAXR);
                return 1;
            }
            p++;    /* increment pointer */
        }
    }
    memfull:;       /* labet for goto */
    free (buf);     /* free memory allocated by getline */
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (i = 0; i < ndx; i++) {     /* display results */
        printf ("each[%2zu] : %s\n", i, each[i]);
        free (each[i]);             /* free memory for each record */
    }
    free (each);        /* free pointers */

    return 0;
}

(note: since nchr isn't used to trim the '\n' from the end of the buffer read by getline, you can eliminate that variable. Just note that there is no need to call strlen on the buffer returned by getline as the number of characters read is the return value)

Example Use/Output

Note: for the input test, I just put your line of records in the file dat/delimrecs.txt and copied it 4 times giving a total of 40 records in 4 lines.

$ ./bin/parse_str_state_getline <dat/delimrecs.txt
each[ 0] : (000,P,ray            )
each[ 1] : (100,D,ray            )
each[ 2] : (009,L,art            )
each[ 3] : (0000,C,max            )
each[ 4] : (0000,S,ben            )
<snip 5 - 34>
each[35] : (020,P,kay            )
each[36] : (040,L,photography    )
each[37] : (001,C,max            )
each[38] : (0001,S,ben            )
each[39] : (0001,P,kay            )

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/parse_str_state_getline <dat/delimrecs.txt
==13035== Memcheck, a memory error detector
==13035== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13035== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13035== Command: ./bin/parse_str_state_getline
==13035==
each[ 0] : (000,P,ray            )
each[ 1] : (100,D,ray            )
each[ 2] : (009,L,art            )
each[ 3] : (0000,C,max            )
each[ 4] : (0000,S,ben            )
<snip 5 - 34>
each[35] : (020,P,kay            )
each[36] : (040,L,photography    )
each[37] : (001,C,max            )
each[38] : (0001,S,ben            )
each[39] : (0001,P,kay            )
==13035==
==13035== HEAP SUMMARY:
==13035==     in use at exit: 0 bytes in 0 blocks
==13035==   total heap usage: 46 allocs, 46 frees, 2,541 bytes allocated
==13035==
==13035== All heap blocks were freed -- no leaks are possible
==13035==
==13035== For counts of detected and suppressed errors, rerun with: -v
==13035== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

This is a lot to take in, but this is a basic minimal example of the framework for handling an unknown number of objects.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • That's not an issue at all. Just open your file `FILE *fp = fopen ("filename", "r");` validate it succeeded `if (!fp) { perror ("filename"); return 1; }` and then read your file `while (fgets (buf, sizeof buf, fp) { /* do what you need with buf */ }`. There are a million examples on SO. – David C. Rankin Jan 19 '18 at 17:20
  • Thank you. I am trying to read the input from a file. So I dont know `MAXS` before hand. I tried doing this - `read = getline(&line, &len, fp)` and then `temp = line; while(*temp!='\0'){ if(*temp++ == ')') MAXS++; } ` . But it is throwing an error - `error: variable-sized object may not be initialized char each[MAXS][MAXL] = {{0}}, ^ ` and a couple of warnings. – Kachu Jan 19 '18 at 17:58
  • I dont know the length of the source. So I need to do it without the macros and need to initialize the `char` array after reading from the file. – Kachu Jan 19 '18 at 18:17
  • Hold on. `getline` is a good choice, I'll post an example. (and just FYI `#define MAXS 10`, etc.. are definitions of constants, not macros. – David C. Rankin Jan 19 '18 at 18:20
  • How long did it take to write this down? A really good and elaborate answer – user2736738 Jan 20 '18 at 17:30
  • 1
    It probably took the better part of an hour. You can just kind of tell when a person is stuck, and needs a little help to be able to grasp some of remaining concepts he needs for the light-bulb to come on. Those are the answer that end up being longer... – David C. Rankin Jan 20 '18 at 21:53