0

Its pretty basic, I have some data in the form : (xxxxx) . To remove the brackets I do :

long len = strlen(data);
for (int  i = 0; i < len - 2; ++i )
    data[i] = data[i + 1];
data[len - 2] = '\0';

Which works. I would like to also take care of another case which is ((xxxxxx)) , and removing the double "((" at start and end . (and not removing the middle occurrence ) .

Assuming that for every (( , there is also )) at the end.

Is there a simple way to modify this code(or another code) to do that with high efficiency ?

EDIT: It is a data parsing, where there are 3 options :

xxxxxx, (xxxxx), ((xxxxx)

Usually I check if the first character is "(" , and apply this function, Now I need that function to also make sure there is not another "(", and if there is , to remove it automatically for me. I can not apply the function twice because If I have only 1 "(" I will loose data.

Curnelious
  • 1
  • 16
  • 76
  • 150
  • 1
    Apply your function twice? – EOF Oct 31 '16 at 20:35
  • 1
    Note: `()` are commonly called parentheses ("parens"). This is compliant with the C standard. – too honest for this site Oct 31 '16 at 20:36
  • 1
    Provide some more detail. Is this a general string parsing problem? Could there be parenthesis which are not eligible for removal (like being commented out)? Or is it sufficient to copy the string(?) and remove every `(` and `)`? – wallyk Oct 31 '16 at 20:38
  • @Olaf thanks for the note, I have to improve my English. EOF, I can not apply the function twice because if its not the case of "((" , I will loose data. – Curnelious Oct 31 '16 at 20:38
  • @wallyk Thanks, I have edited the question. So you say I will run a for loop, check the first character, and as long as its "(" , I will apply this function? – Curnelious Oct 31 '16 at 20:41
  • 1
    Curnelious @EOF is right - if you have a function that removes matching pairs and tells you if it did, you can keep calling it until it says "no". – Weather Vane Oct 31 '16 at 20:43
  • Assuming there are no parentheses in the middle: `strtok( data, "()" )` will return a pointer to the string with the parentheses removed. – user3386109 Oct 31 '16 at 20:48
  • Earlier you mentioned `((xxxxx))` but after the edit it is unclear what you want to happen with that, and with `((xxxxx)`. – Weather Vane Oct 31 '16 at 20:49
  • Gonna go out on a limb and suggest that 'C' may not be the preferred language for doing string parsing. Just in your sample code, I see possible issues where you aren't looking at the last character in your string, where your data, if not a string reads forever, etc. – Michael Dorgan Oct 31 '16 at 20:52
  • @MichaelDorgan: Done correctly, there is nothing wrong using C. Problem is the syntax OP wants is not really clear. – too honest for this site Oct 31 '16 at 21:30
  • 1
    @EOF, the weakness to calling the function twice is that the shifting of the interior of the string may be done twice (or x times, depending on OP's goal). To be efficient, better to minimize the shifting of data by examining the edges first. – chux - Reinstate Monica Oct 31 '16 at 21:41
  • @chux Thats right, so what do you think about my answer? – Curnelious Oct 31 '16 at 21:43

6 Answers6

1

Assuming your data is always longer than 3 chars...maybe something like this?

long len = strlen(data);
int ini = 0, end = 0;
// you know it starts with parenthesis
if ( data[0] == '(') {
    // check if it has double parenthesis at the beginning
    if ( data[1] == '(') {
        ini = 2;
        end = len - 2;
    }
    else {
        // It had only one parenthesis
        ini = 1;
        end = len -1;
    }
    // replace the data you need
    int  i = 0;
    for ( ; ini < end; ++i, ini++ )
        data[i] = data[ini];
    // "close" the "string"
    data[i] = '\0';
}

EDIT: Thanks chqrlie, missed that part

Community
  • 1
  • 1
  • 2
    `data[i] = '\0';` references `i` outside the scope of the `for`. Declare `i` outside the `for` statement. – chqrlie Oct 31 '16 at 21:22
1

Assuming this is only for parens at the start and end of the string...

  • Stop if there's less than two characters.
  • Check for balanced parens.
    • Nudge the string in at both ends.
    • Recurse.
char *strip_parens(char *string) {
    size_t len = strlen(string);
    if( len < 2 ) {
        return string;
    }

    char start = string[0];
    char end   = string[len-1];

    if( start == '(' && end == ')' ) {
        string[len-1] = '\0';
        string += 1;

        return strip_parens(string);
    }
    else {
        return string;
    }
}

This avoids having to copy the whole string, but the original pointer has to be retained for deallocation else it will leak memory. This is most useful when the string is large and the stripped version is only for temporary use.

char *stripped = strip_parens(string);

/* do something with stripped */

free(string);  /* not stripped */

Alternatively you can put them both into a struct and manage the struct.

Anything more complicated and you should look into using regular expressions or a grammar.

Schwern
  • 153,029
  • 25
  • 195
  • 336
1

Look at the beginning and at the end of the string for a pair of (). Keep looking until a pair is not found. (or limit to 2 matches if desired)

Efficiency: 1 run down the string to finds its length, 1 call to shift the interior of the string. O(n)

char *C_RemoveOuterBrackets(char *data) {
  char *src = data;
  size_t len = strlen(src);
  // Matching pair of () found
  while (src[0] == '(' && src[len - 1] == ')') {
    src++;
    len -= 2;
  }
  memmove(data, src, len);
  data[len] = '\0';
  return data;
}

Handles "(())", "" and does not need to assume the ends are matched (extra parens remain).

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • regarding efficiency, you might consider an extra test to avoid modifying the string if `data == src`. – chqrlie Oct 31 '16 at 21:20
  • @chqrlie True, could start the function with `if (data[0] == '(') { ...` for the that case. OTOH, `memmove()` may have its own `len==0` early return. – chux - Reinstate Monica Oct 31 '16 at 21:24
  • an initial `if (*data == '(') {` would be best IMHO. – chqrlie Oct 31 '16 at 21:34
  • @chqrlie Note: [Apple](http://opensource.apple.com//source/ntp/ntp-13/ntp/libntp/memmove.c) and not some [others](http://stackoverflow.com/a/13339861/2410359) have an early exit. – chux - Reinstate Monica Oct 31 '16 at 21:47
  • Indeed, Apple does, both the `len==0`, and the `dst==src`, although you linked to the source code for ntp, not the C library. The GNU C version is horrible: it copies backwards if `dst==src`, in spite of the comment that promisses to use the forward copy whenever possible. – chqrlie Oct 31 '16 at 22:14
0

You could have a function to count the parenthesis. So it will give a desired (I guess) answer for even more parenthesis.

int countPar(char *s)
{
    int num = 0, i;
    for(i = 0; (i < strlen(s)) && (s[i] == '('); i++)
        num++;
    return num;
}

Then you just do the following

num = countPar(data)
for (int  i = 0; i < strlen(data) - 2 * num; ++i )
    data[i] = data[i + num];
data[len - 2 * num] = '\0';

feel free to correct me, I did not actually try it, just counted.

elikatsis
  • 479
  • 1
  • 3
  • 8
  • Quite inefficient and not careful enough invokes undefined behavior on `"("` – chqrlie Oct 31 '16 at 21:24
  • 2
    What makes both code fragments vastly inefficient is your computing the length of the string for each iteration test. In the first case, it is completely useless to even use `strlen(s)`: try `{ int i = 0; while (s[i] == '(') i++; return i; }`. Or just `num = strspn(data, "(");` – chqrlie Oct 31 '16 at 21:41
0

According to EOF comment :

void strip()
{

long len = strlen(data);
for (int  i = 0; i < len - 2; ++i )
    data[i] = data[i + 1];
data[len - 2] = '\0';

//RECURSIVE
if( data[0] == '(' )
 strip();


}
Curnelious
  • 1
  • 16
  • 76
  • 150
  • What if the data does not have any parentheses? – chqrlie Oct 31 '16 at 21:35
  • As i wrote, then i will not call this function to begin with. – Curnelious Oct 31 '16 at 21:38
  • 1
    OK, I did not notice you were answering your own question... you should modify it to make it compilable. Why not make the function work in all cases and call it directly? Also use a loop instead of a silly tail recursion and do not use `long` for `len`. – chqrlie Oct 31 '16 at 21:47
  • 1
    @Curnelious As [requested](http://stackoverflow.com/questions/40350154/removing-a-double-brackets#comment67956615_40350154), `long` is not the best type here. `size_t` is the Goldilocks type to use for array indexing, not too wide, not too narrow. Code is strangely passing data as a global. The recursive call invokes `strlen(data)` multiple times - inefficient and shifting the interior part of the string perhaps twice - inefficient. With `"("`, `data[len - 2] = '\0';` is UB. – chux - Reinstate Monica Oct 31 '16 at 21:53
-1

Here's the most elegant way (I can think of right now) to do so:

int
main()
{
  char *data = strdup("((hello))");
  char *whead, *wtail;
  size_t len;

  len = strlen(data);
  whead = data;
  wtail = data + len - 1;
  while (*whead == '(' && *wtail == ')')
    {
      *wtail = '\0';
      whead++;
      wtail--;
      len--;
    }
  memmove(data, whead, len + 1);
  printf("%s\n", data);
  return (0);
}

If you're sure you have at least as many closing parentheses as you have opening ones, you can go with the following code:

int
main()
{
  char *data = strdup("((hello))");
  size_t i;
  size_t len;

  len = strlen(data);
  for (i = 0; data[i] == '('; i++)
    ;
  memmove(data, data + i, len - (i * 2));
  data[len - (i * 2)] = '\0';
  printf("%s\n", data);
  return (0);
}
yoones
  • 2,394
  • 1
  • 16
  • 20
  • 1
    Assuming the parentheses to be balanced is risky. The programmer might be sure, but he might be mistaken... Your code would invoke undefined behavior (of the worst kind) on the simple string `"("`. The first function fails to copy the `'\0'` byte if len was decremented, you should write `memmove(data, whead, len + 1);` – chqrlie Oct 31 '16 at 21:26
  • Also note that `len` should be a `size_t`. Some weird architectures have 32-bit `long` and 64-bit `size_t`... A sadly common choice as a matter of fact. – chqrlie Oct 31 '16 at 21:32