1

How do you extract strings between two specified strings? For Example: <title>Extract this</title>. Is there a simple way to get it using strtok() or anything simpler?

EDIT: The two specified strings are <title> and </title> and the string extracted is Extract this.

  • 1
    `strstr()` is better. – Iharob Al Asimi May 18 '15 at 11:45
  • Just to echo what Mr. @iharob said, see [here](http://linux.die.net/man/3/strstr). – Sourav Ghosh May 18 '15 at 11:46
  • I think OP wanted something along the lines of `[extract me]<title>` as a [regexp](http://stackoverflow.com/questions/1085083/regular-expressions-in-c-examples). – Eregrith May 18 '15 at 11:47
  • `sscanf(string, "%[^<]", extracted_string);` or `sscanf(string, "%*[^>]>%[^<]<%*[^>]>" , extracted_string);` will do the job. Checking the return value is also recommended. The `` in the first `sscanf` and `<%*[^>]>`in the second aren't required. – Spikatrix May 18 '15 at 11:57

3 Answers3

4
  • Search for the first sub string using strstr().
  • If found, save the array index of the sub string
  • From there, search for the next sub string.
  • If found, everything between [ [start of sub string 1] + [length of sub string 1] ] and [start of sub string 2] is the string you are interested in.
  • Extract the string using strncpy() or memcpy().
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
Lundin
  • 195,001
  • 40
  • 254
  • 396
1

This is an example of how you can do it, it's not checking the input string integrity

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *extract(const char *const string, const char *const left, const char *const right)
{
    char  *head;
    char  *tail;
    size_t length;
    char  *result;

    if ((string == NULL) || (left == NULL) || (right == NULL))
        return NULL;
    length = strlen(left);
    head   = strstr(string, left);
    if (head == NULL)
        return NULL;
    head += length;
    tail  = strstr(head, right);
    if (tail == NULL)
        return tail;
    length = tail - head;
    result = malloc(1 + length);
    if (result == NULL)
        return NULL;
    result[length] = '\0';

    memcpy(result, head, length);
    return result;
}

int main(void)
{
    char  string[] = "<title>The Title</title>";
    char *value;

    value = extract(string, "<title>", "</title>");
    if (value != NULL)
        printf("%s\n", value);
    free(value);

    return 0;
}
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Do you have to free `value` if it was declared on the stack? – user1717828 May 18 '15 at 12:22
  • @user1717828 No you must not free it. – Iharob Al Asimi May 18 '15 at 12:33
  • @user1717828 it's **not** a stack variable, please read the code carefuly. The pointer is stored on the stack, the data the poitner points to is on the heap clearly, since it was allocated with `malloc()` in `extract()`. – Iharob Al Asimi May 18 '15 at 12:36
  • whoops, still learning C! So you point `result` to memory on the heap, return the `result` pointer to `value`, and at the end run `free(value);` to free up the memory that `result` was pointed to? Is it standard to free up space with a different pointer than initial allocation was assigned to? Sorry for the erroneous edit. – user1717828 May 18 '15 at 12:47
  • It's a different pointer pointing to the same address returned by `malloc()`. – Iharob Al Asimi May 18 '15 at 12:48
0

The answer by Mr. @Lundin is nice one. However, just to add a bit more generic approach, (without depending on the <tag> value itself), you can also do like,

  1. Locate the first instance of < [tag opening angle bracket] using strchr()
  2. find the first first instance of > [tag closing angle bracket] using strchr().
  3. save the indexes and the difference of two indexes, copy the string to a temporary array. will treat as the tag value.
  4. Locate the last instance of < [tag opening angle bracket] using strrchr()
  5. find the last instance of > [tag closing angle bracket] using strrchr().
  6. Again, save the indexes and the difference of two indexes, copy the string to another temporary array. Compare with previously stored tag value, if equals, do a memcpy() / strdup() from acualarray[first_last_index] (closing starting tag) upto acualarray[last_first_index] (starting of closing tag.)
Community
  • 1
  • 1
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261