Store text between two characters in a array

Question

I am having this char array char txt[80] = "Some text before $11/01/2017$"; and need to copy the content between the two $ into a string which would be 11/01/2017. How can I do this with the <string.h> functions?

Have you tried a smart combinaison of [malloc](https://man7.org/linux/man-pages/man3/free.3.html) and while loop ? What did you tried ? — Hollyol, Jan 21 '21 at 17:18
Find [the first](https://en.cppreference.com/w/c/string/byte/strchr) and [the last](https://en.cppreference.com/w/c/string/byte/strrchr) `'$'` in the string. Then get the length of the text between those two points, and [copy](https://en.cppreference.com/w/c/string/byte/strncpy) from the first into a new array. — Some programmer dude, Jan 21 '21 at 17:19
Why not using a simple `for` loop, reading and copying characters? — Damien, Jan 21 '21 at 17:23

David C. Rankin · Accepted Answer · 2021-01-21T20:15:52.273

2

A simple way to obtain a single token that does not modify the original string is to use two calls to strcspn() establishing a start-pointer to the first delimiter ('$' in this case) and an end-pointer to the last character in the token (the character before the second '$' or end-of-string if no second '$' is present). You then validate that characters exist between the start-pointer and end-pointer and use memcpy() to copy the token.

A short example is:

#include <stdio.h>
#include <string.h>

int main (void) {
    
    char txt[80] = "Some text before $11/01/2017$",
        *sp = txt + strcspn (txt, "$"),                 /* start ptr to 1st '$' */
        *ep = sp + strcspn (*sp ? sp + 1 : sp, "$\n"),  /* end ptr to last c in token */
        result[sizeof txt] = "";                        /* storage for result */
    
    if (ep > sp) {                                      /* if chars in token */
        memcpy (result, sp + 1, ep - sp);               /* copy token to result */
        result[ep - sp] = 0;                            /* nul-termiante result */
        printf ("%s\n", result);                        /* output result */
    }
    else
        fputs ("no characters in token\n", stderr);
}

(note: the ternary simply handles the case txt is the empty-string. The '\n' is added as part of the 2nd delimiter to handle strings past from fgets() or POSIX getline() where no second '$' is present and '\n' is the last character in the string.)

Works also with any combination of empty-string, zero, one or two '$' and does not modify original so is safe for use with String-Literals.

Example Use/Output

$ ./bin/single_token
11/01/2017

Let me know if you have additional questions.

Variation Allowing Valid Empty-String as Result

A neat improvement provided by @chqrlie providing a test of (*sp == '$') instead of (ep > sp) would allow the empty-string (no characters in token) to be a valid result -- I agree). The change would be:

    if (*sp == '$') {                                   /* if chars in token */
        memcpy (result, sp + 1, ep - sp);               /* copy token to result */
        result[ep - sp] = 0;                            /* nul-termiante result */
        printf ("%s\n", result);                        /* output result */
    }

So if you want to consider an empty token (like an empty field in a .csv, e.g. "one,,three,four") to be a valid token, use this alternative.

edited Jan 21 '21 at 20:15

answered Jan 21 '21 at 19:50

David C. Rankin

81,885
6
58
85

Combining all variable definitions in a single multiline as posted reminds me of the eighties :) Also you do not distinguish between ill formated strings and those with an empty token like `"$$"`. – chqrlie Jan 21 '21 at 19:53
1

I started driving in 1981 `:)` (trivia -- bought a 1968 SS 386 El Camino for $1,200 -- the good old days when you could do an oil change for less than $5) – David C. Rankin Jan 21 '21 at 19:54
The check on `ep - sp` eliminates all ill formed strings such as `""` or `"$"` or `"$$"` you can add an `else` clause to output a diagnostic if wanted. I guess `result` should be initialized as an empty-string with `result = "";` to prevent conceivable misuse of it later. – David C. Rankin Jan 21 '21 at 19:56
That's exactly my point: why consider `"$$"` to be ill formed? The token is empty but present. – chqrlie Jan 21 '21 at 19:58
Okay, so you are saying `"$$"` would result in a valid empty-string -- that makes sense. – David C. Rankin Jan 21 '21 at 19:58
Yes. The test `if (ep > sp)` should be `if (*ep == '$')`. Whether embedded newlines are allowed in a token is also debatable, but teaching the OP about `strcspn()` deserves kudos! – chqrlie Jan 21 '21 at 20:00
Only problem there is `*ep == '$'` would then require both `'$'` to be present. I wanted to explicitly accept the token if the second `'$'` was absent. – David C. Rankin Jan 21 '21 at 20:02
Then `if (*sp == '$')` does the job. – chqrlie Jan 21 '21 at 20:03
Okay -- I see what you are saying (I think), you are not saying replace `(ep > sp)` entirely, but add `if (*ep == $)` as an initial check to catch the `"$$"` allowing a valid empty-string and then using `else if (ep > sp)` for the remaining cases. Otherwise -- I am not understanding your thought process. But the problem I have with that is `"$"` also make `*ep == '$'` which would catch the empty-string no 2nd delimiter case as well. – David C. Rankin Jan 21 '21 at 20:06
Since you are willing to accept a missing second `$`, the test for success is simply that an initial `$` has been found. Hence simply replace `if (ep > sp)` with `if (*sp == '$')` or even just `if (*sp)` – chqrlie Jan 21 '21 at 20:09
Now that makes sense -- Bingo -- light-bulb on. Yes that is a clean way to do it in that case. – David C. Rankin Jan 21 '21 at 20:09
The intent was to have `ep` point to the character before the second delimiter, if present, so `strcspn (*sp ? sp + 1 : sp, "$\n")` is well-formed for all cases. In the case of `"$"` `ep` just points to `'$'`. – David C. Rankin Jan 21 '21 at 20:24
Well, that's a bit contorted but does work. – chqrlie Jan 21 '21 at 20:33

vmp · Answer 2 · 2021-01-21T17:33:46.247

Suppose you are sure that you have 2 $ in your string... You could do the following:

char *first_dollar = strchr(txt, '$'); //get position of first dollar from the start of string
char *second_dollar = strchr(first_dollar + 1, '$'); //get position of first dollar starting
                                                    // from one position after the first dollar
char tocopy[20];
*second_dollar = '\0'; //change the value of last dollar to '\0'
strcpy(tocopy, first_dollar + 1); //copy into the place you want
*second_dollar = '$'; // put back the second dollar

If you are not sure to have the 2 $ in your string you should check the return of strchr, which will be NULL.

Is it mandatory to use string? There is a clever way using sscanf:

char txt[80] = "Some text before $11/01/2017$";
char t[20];
sscanf(txt, "%*[^$]$%[^$]", t);
printf("ORIGINAL TEXT: %s\nEXTRACTED TEXT: %s\n", txt, t);

The format in the scanf means the following:

Ignore all characters that are not $;
Ignore 1 $.
Read all characters until you find the next $ and store it in t.

There are plenty of solutions without attempting to modify the source string, which would crash if a string literal is passed. Furthermore, the `sscanf()` approach cannot handle this case: `"$$"` because `%[^$]` must match at least one byte. — chqrlie, Jan 21 '21 at 19:49

Itati · Answer 3 · 2021-01-22T02:15:31.240

0

I don't know why you need to use string.h.

For your reference, this is without string.h Method.

Update

#include <stdio.h>
#include <string.h>
int main(){
  char txt[80] = "Some text before $21/01/2017$ and $32/01/2017$ and $$ end $abc$";
  char get[80] = { '\0' };
  int i = 0, k = -1, j = 0;
  int len = strlen( txt ); // Get length
  for ( i = 0 ; i < len ; i++ ){
    bool   find = false;
    for ( i  ; txt[i] != '$' && txt[i] != '\0' ; i++ ); // Find '$' location
      if ( txt[i] == txt[i+1] && txt[i] == '$' ) { // Check $$ case
        find = true;
        get[++k] = ' ';
      } // if
      for ( j = i + 1 ; txt[j] != '$' && txt[j] != '\0' ; j++ ){
        find = true;
        get[++k] =  txt[j];
      } // for

   if ( find == true ) get[++k] = ' '; // add space
    i = j ;
  } // for

  get[k] = '\0'; // remove last space
  printf( "%s", get );
  return 0;
} // main()

Output:

21/01/2017 32/01/2017   abc

edited Jan 22 '21 at 02:15

answered Jan 21 '21 at 17:55

Itati

193
11

1

Note that this code will not stop if the second $ is missing – Damien Jan 21 '21 at 18:06
plenty corner cases fail – 0___________ Jan 21 '21 at 18:27
1

@Damien: it will not even stop if the first `$` were missing... undefined behavior in both cases. – chqrlie Jan 21 '21 at 19:43
@Damien Thank you for your reminder, I updated the code. – Itati Jan 22 '21 at 02:06

Damien · Answer 4 · 2021-01-21T19:09:43.150

0

Extract text betwee $

This can be done by a simple for loop, reading and copying characters.
in the following code, the parameter inside indicate if we are currently or not betweet two $

The function returns 1 if two $ were effectively found

#include <stdio.h>
#include <string.h>

// return 1 if two $ have been found, 0 elsewhere
int extract (char *in, char *out, char c) {
    if (in == NULL) return 0;
    int size = strlen(in);
    int inside = 0;
    int n = 0;      // size new string
    for (int i = 0; i < size; ++i) {
        if(in[i] == c) {
            if (inside) {
                inside = 2;
                break;  // 2nd $
            }
            inside = 1;         // 1st $
        } else {
            if (inside) {       // copy
                out[n++] = in[i];
            }
        }
    }
    out[n++] = '\0';
    return inside == 2;
}

int main() {
    char txt[80] = "Some text before $11/01/2017$";
    char txt_extracted[80];
    int check = extract (txt, txt_extracted, '$');
    if (check) printf ("%s\n", txt_extracted);
    else printf ("two $ were not found\n");
    return 0;
}

edited Jan 21 '21 at 19:09

answered Jan 21 '21 at 18:00

Damien

4,809
4
15
20

Does not work in corner case https://godbolt.org/z/c18da9 – 0___________ Jan 21 '21 at 18:25
@0___________ A little far-fetched, but corrected to hande the NULL case. I tried to use `strnlen_s` to handle more corner cases but my old gcc compiler (on my home PC) cannot handle it. – Damien Jan 21 '21 at 19:11
@0___________: `in == NULL` is not a corner case, it is outside the specification: the OP says the input is a `char` **array**. But handling null pointers gracefully is probably a good idea. – chqrlie Jan 21 '21 at 19:40
@chqrlie arrays cannot be passed by value. I think OPs terminology is not very accurate as he is very beginner. – 0___________ Jan 21 '21 at 20:04
@0___________: true, but undefined behavior if the argument does not point to a proper C string, which is the specified behavior of `strlen()` is acceptable if documented. – chqrlie Jan 21 '21 at 20:05

score 0 · Answer 5 · answered Jan 21 '21 at 18:07

0

There is a function called strtok. (https://www.cplusplus.com/reference/cstring%20/strtok/) Here is a video about it: https://www.youtube.com/watch?v=34DnZ2ewyZo.

I tried this code:

#include <stdio.h>
#include <string.h>

int main()
{
    char txt[] = "Some text before $11/01/2017$, some text, $11/04/2018$ another text more and more text $01/02/2019$";
int skip = 0;

char* piece = strtok(txt, "$");

while(piece != NULL)
{
    piece = strtok(NULL, "$");

    if(piece == NULL)
        break;

    if(skip != 1)
    {
        skip = 1;    
        printf("%s \n", piece);
    }
    else
        skip = 0;
    
}

    return 0;
}

Output:

11/01/2017
11/04/2018
01/02/2019

answered Jan 21 '21 at 18:07

MedzsikTörtül

28
3

It works for me but could you maybe explain the code? – secdet Jan 21 '21 at 18:13
strtok splits the char[]. if(skip != 1) will always skip strings like "some text"... piece = strtok(NULL, "$"); https://stackoverflow.com/questions/23456374/why-do-we-use-null-in-strtok – MedzsikTörtül Jan 21 '21 at 18:28
@0___________ the first corner can be fixed by just testing if(txt != NULL) before the while loop – MedzsikTörtül Jan 21 '21 at 18:36
1

Note: `strtok()` modifies the original string (replacing delimiters with `'\0'`) so it cannot be used with *String-Literals*. So if the string to be tokenized is a string-literal or if you need to preserve the original -- *make a mutable copy* and tokenize the copy. – David C. Rankin Jan 21 '21 at 19:16

0___________ · Answer 6 · 2021-01-21T19:13:42.943

Here is a function. Handless well corner cases. Uses string.h functions.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *texBetween(const char *str, char ch)
{
    char *result = NULL;
    const char *start, *end;

    if(str)
    {
        start = strchr(str, ch);
        if(start)
        {
            end = strchr(start + 1, ch);
            if(end)
            {
                result = malloc(end - start);
                if(result)
                {
                    memcpy(result, start + 1, end - start - 1);
                    result[end - start] = 0;
                }
            }
        }
    }
    return result;
}

int main()
{
    char *result;
    printf("\"%s\"\n", (result = texBetween("$$", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("$ $", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("$test$", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("test$$", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("test$test1$", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("test$1234$test$dfd", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween(NULL, '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("", '$')) ? result : "ERROR"); free(result);
    printf("\"%s\"\n", (result = texBetween("$", '$')) ? result : "ERROR"); free(result);

}

https://godbolt.org/z/K5P6zb

The test bench is not very explicit. You should output both the source and extracted string and check if the extracted string is the expected one. — chqrlie, Jan 21 '21 at 19:37
@chqrlie as it is very simple testing and the corner cases are quite easy to predict I believe it is sufficient — 0___________, Jan 21 '21 at 20:06

Store text between two characters in a array

6 Answers6