How to separate terms between parentheses and list them in C

Question

I have the following problem, I have a string that comes with x terms, and some are in parentheses. I need to be able to separate the terms that are in parentheses and place them in the nodes of a list:

The first node in the list contains the entire character string, then the terms that are in parentheses are stored in the following nodes.

Something like this: String: a*(b+(c+j)-h)*2 + (a+(b-c))

So, in list: Firts node :

a*(b+(c+j)-h)*2+(a+(b-c))->(b+(c+j)-h)->(c+j)->(a+(b-c))->(b-c)

I need ideas to implement the algorithm of separation of terms, I am stuck with that part. I have already implemented all the lists part. The function they ask me for is

void separate (char * s)

@Someprogrammerdude Yes, it seems good recursion. But how can I parse the string and that it detects me from the first parenthesis until the one that closes that parenthesis. It confuses me that if the first parenthesis searches for ')' there may be another term of parenthesis (as there is in the example). — Renzo, Jul 27 '19 at 17:41
Presumably, a recursive function for the purpose would return information about how much of the string had been parsed, so that its caller could\ pick up with the remainder of the string, if any. That implies that it would not necessarily be `separate()` itself that is recursive, but rather some helper function that it calls. — John Bollinger, Jul 27 '19 at 17:47
"I have already implemented all the lists part" --> post that code. — chux - Reinstate Monica, Jul 27 '19 at 19:17
@chux I didn't find it necessary to post that part. They are all list primitives, create list, list_insert, list_append, create_node.. I only need the algorithm / code, to separate a string between parentheses. If given a function it returns a substring with what is inside the two parentheses I can insert it in my list. And then I can call that function back with recursion as they told me above. That's what I'm thinking of doing, I don't know if it's all right. If there is any better idea, welcome be — Renzo, Jul 27 '19 at 19:31

score 1 · Answer 1 · answered Jul 28 '19 at 10:57

John Bollinger recommended:

Presumably, a recursive function for the purpose would return information about how much of the string had been parsed, so that its caller could\ pick up with the remainder of the string, if any. That implies that it would not necessarily be separate() itself that is recursive, but rather some helper function that it calls.

OP already sketched (in comment):

If given a function it returns a substring with what is inside the two parentheses I can insert it in my list. And then I can call that function back with recursion as they told me above.

This is where I started.

There come various methods in mind to build substrings:

modify original string by overwriting a character with a delimiter '\0' (may be, temporarily) – like done e.g. in strtok()
allocate memory for substrings and copy portions of original string (using something like e.g. strdup())
just return length (and, may be, start) of substring

I used the third option.

#include <assert.h>
#include <stdio.h>
#include <string.h>

void add_term(char *s, size_t len)
{
  printf("'%.*s'\n", (int)len, s);
}

size_t find_term(char *s, size_t len)
{
  assert(*s == '(');
  int n = 1;
  size_t i = 1;
  for (; i < len && n; ++i) n += (s[i] == '(') - (s[i] == ')');
  return n ? fprintf(stderr, "ERROR!\n"), 0 : i;
}

void separate_terms(char *s, size_t lenTotal, int addTotal)
{
  for (size_t i = 0; i < lenTotal;) {
    switch (s[i]) {
      case '(': {
        const size_t len = find_term(s + i, lenTotal - i);
        if (!len) return;
        if (len < lenTotal + addTotal) add_term(s + i, len);
        separate_terms(s + i + 1, len - 2, 1);
        i += len;
      } break;
      case ')': fprintf(stderr, "ERROR!\n"); return;
      default: ++i;
    }
  }
}

void separate(char *s)
{
  size_t lenTotal = strlen(s);
  printf("Term: "); add_term(s, lenTotal);
  separate_terms(s, lenTotal, 0);
}

int main(void)
{
  // sample of OP:
  separate("a*(b+(c+j)-h)*2 + (a+(b-c))");
}

Output:

Term: 'a*(b+(c+j)-h)*2 + (a+(b-c))'
'(b+(c+j)-h)'
'(c+j)'
'(a+(b-c))'
'(b-c)'

Notes:

OP stated that storage of results in lists (with appropriate management) is already solved. Hence, I used simple console output instead (in add_term()).

find_term() is simply counting opening and closing parentheses until the closing parenthesis corresponding to the first opening is found (success) or the string end is reached (error case).

separate_terms() is the recursive descent scanner to find sub-terms in a term.

It iterates over characters of input string. Every time an opening parenthesis (() is found, it calls find_term() to determine the length of sub-term and add the result. The found sub-term is analized recursively by calling itself recursively to the restricted length. Afterwards, the length of subterm is skipped because its fully analyzed now.

While fiddling with edge cases, I found out that my initial (significantly shorter) version reported a duplicated term when the complete input term was in parentheses (e.g. (a + b)). Hence, I added a flag addTotal to denote whether a found subterm shall be added if it has the total length of input.

separate() is rather short. It's just a wrapper to add a term for ful input and to perform the toplevel call of the recursive separate_terms(). (0 is passed for addTotal to prevent duplicate adding of complete input.)

I made some other samples which I considered as edge cases for my solution to check for potential bugs:

  // valid edge cases:
  separate(""); // valid?
  separate("()"); // valid?
  separate("a");
  separate("(a + b)");
  separate("(a + b) * 2");
  separate("(a + b) * (c - d)");
  separate("((a + b))");
  // invalid cases:
  separate("(");
  separate("((");
  separate(")");
  separate("))");
  separate("())(()");
  separate("a(a(a");
  separate("((a)))");

Output:

Term: ''
Term: '()'
Term: 'a'
Term: '(a + b)'
Term: '(a + b) * 2'
'(a + b)'
Term: '(a + b) * (c - d)'
'(a + b)'
'(c - d)'
Term: '((a + b))'
'(a + b)'
Term: '('
ERROR!
Term: '(('
ERROR!
Term: ')'
ERROR!
Term: '))'
ERROR!
Term: '())(()'
'()'
ERROR!
Term: 'a(a(a'
ERROR!
Term: '((a)))'
'((a))'
'(a)'
ERROR!

Live Demo on coliru

gsamaras · Accepted Answer · 2019-07-28T12:53:09.363

Recursion is a common way to parse nested structures, as @SomeProgrammerDude commented.

My intuition for the algorithm that you are looking for is this:

1) Recursively call the separate method.

2) Every call should parse one term enclosed in parentheses, and insert it at the end of our list.

3) The method has to two modes: iterate over the input string and iterate over the input string while filling up the term.

We are looking for the first occurrence of an opening parenthesis; once this is found, the method enables the "read" mode, and starts copying everything in its path to a local buffer, until the closing parentheses is found.

When that happens, the method disables the "read" mode, flagging that the term is now ready for insertion in the list.

4) How to bypass nested terms? Once the 1st opening parenthesis is found, maintain a counter of parentheses, which is adjusted according to the number of parentheses met (+1 for an opening parenthesis, -1 for a closing one).

If an opening parenthesis is met, the counter gets incremented by 1. If the counter is greater than 1, that signals us that a nested term is found, and thus we shouldn't stop when its closing parentheses is met, but instead continue, while decrementing the counter by 1.

5) Once a term is inserted, then we should recursively call our separate method, where now the input string should be the already parsed input, starting from the character after the first opening parenthesis that we found.

6) Base case of the recursion? When all the input is consumed, and we reached the NULL string terminator.

Full code example (uncomment print statements that show the status as the program progresses):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct listnode *Listptr;

struct listnode {
  char* value;
  Listptr next;
};

void insert_at_end(Listptr *, char *);
void print(Listptr);
void free_list(Listptr *);

void separate(Listptr *list, char *s)
{
  if(!*s)
    return;
  int parentheses = 0, read = -1, i = 0;
  char term[16] = {0}, *opening_parenthesis;
  //printf("Before while input is %s\n", s);
  while(*s)
  {
    //printf("#parentheses = %d, read = %d, i = %d, term = %s\n", parentheses, read, i, term);
    if(*s == '(')
    {
      if(parentheses == 0)
        opening_parenthesis = s;
      read = 1;
      parentheses++;
    }
    else if(*s == ')')
    {
      if(parentheses > 1)
        parentheses--;
      else if(parentheses == 1)
      {
        term[i++] = *s;
        read = 0;
      }
    }
    if(read == 1)
    {
      term[i++] = *s;
      //puts(term);
    }
    else if (read == 0)
    {
      insert_at_end(list, term);
      separate(list, opening_parenthesis + 1);
      return;
    }
    s++;
  }

}

int main(void)
{
  char buf[128];
  fgets(buf, sizeof(buf), stdin);
  // eat trailing newline of fgets
  if(!strlen(buf))
  {
    printf("No input, exiting..\n");
    return EXIT_FAILURE;
  }
  // eat trailing newline of fgets
  buf[strlen(buf)] = '\0'; 
  // Goal: a*(b+(c+j)-h)*2+(a+(b-c))->(b+(c+j)-h)->(c+j)->(a+(b-c))->(b-c)
  Listptr list = NULL;
  insert_at_end(&list, buf);
  //printf("First node: "); print(list);

  separate(&list, buf);

  print(list);
  free_list(&list);
  return EXIT_SUCCESS;
}

void insert_at_end(Listptr *ptraddr, char* str)
{                    /* Insert str as last element of list *ptraddr */
  while (*ptraddr != NULL)                     /* Go to end of list */
    ptraddr = &((*ptraddr)->next);/* Prepare what we need to change */
          /* In real application, check for NULL pointer of malloc! */
  *ptraddr = malloc(sizeof(struct listnode)); /* Space for new node */
     /* Space for string. Length of str + 1 for the null terminator */
  (*ptraddr)->value = malloc((strlen(str) + 1) * sizeof(char));
  strcpy((*ptraddr)->value, str);                      /* Copy str  */
  (*ptraddr)->next = NULL;              /* There is no next element */
}

void print(Listptr list)                  /* Print elements of list */
{ 
  while (list != NULL) {       /* Visit list elements up to the end */
    printf("%s--> ", list->value);         /* Print current element */
    list = list->next;                        /* Go to next element */
  }
  printf("NULL\n");                            /* Print end of list */
}

void free_list(Listptr *ptraddr)
{ 
  Listptr templist = *ptraddr;
  while(*ptraddr != NULL) {
      *ptraddr = (*ptraddr)->next;
      free(templist->value);
      free(templist);
      templist = *ptraddr;
  }
  *ptraddr = NULL; 
}

Output (for your input):

a*(b+(c+j)-h)*2 + (a+(b-c))--> (b+(c+j)-h)--> (c+j)--> (a+(b-c))--> (b-c)--> NULL

Note1: I altered the prototype of your separate method to allow for the list to be passed also as a parameter. I guess in your implementation the list is a global variable or something, which in general should be avoided.

Note2: It could have been implemented without the usage of the read flag, by using just the variable for the number of parentheses to determine when to read (when that number positive), and you could assign a reserved value (-1 for example), to designate when reading is done, and the term is ready for list insertion.

Assumption1: Term's length is 15 characters at most. The code should typically check that this length is not surpassed - not done in my implementation.

Assumption2: It is assumed that term is valid, i.e. it contains the same number of opening and closing parentheses. In a real-world scenario one should sanitize the input.

Live demo

PS: I have the list implementation in strList(C) also.

Thank you very much! It was very useful your answer – Renzo Jul 29 '19 at 19:40 — Renzo, Jul 29 '19 at 19:40

How to separate terms between parentheses and list them in C

2 Answers2