0

im trying to split a string in C (Not in C#, C++ or any other kind). I tried using de strtok function, but it turns out that this only works when the limit between each word is a single character such a space, a semicolon....

I have a variable which is a string that contains html code like this:

</head>
<body>
Index of /davidgoudet
<ul><li><a href="/"> Parent Directory</a></li>
<li><a href="Horario/"> Horario/</a></li>
<li><a href="Oferta/"> Oferta/</a></li>
<li><a href="Registro/"> Registro/</a></li>
</ul>
<address>Apache mod_fcgid/2.3.6 mod_auth_passthrough/2.1 mod_bwlimited/1.4                FrontPage/5.0.2.2635 Server at turpialdevelopment.com Port 80</address>
</body></html>

And i want to have the chunks in between the href tags such as Horario, Oferta, Registro inside a variable but when i tried to use strtok(string, "href") it gives me some weird result which is not the one im looking for.

Any ideas? Thanks

camelCase
  • 521
  • 2
  • 10
  • 22
  • Use a [parser](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). He comes. – David Heffernan Oct 12 '11 at 21:30

6 Answers6

4

strtok takes a char array of all possible delimiters and it splits based on any of those characters (in your case, splitting on h, r, e, or f), which is probably why you are seeing weird behavior.

Is there a reason why you aren't using an HTML parsing library to pull the names?

the libxml html parser is pretty good: http://www.xmlsoft.org/html/libxml-HTMLparser.html

Foo Bah
  • 25,660
  • 5
  • 55
  • 79
1

This is my solution, and I hope will solve your problem.

int split(char ***dst, char *str, char spliter)
{
    int str_num = 0;    
    int each_size;   
    int index = 0;     
    int str_index = 0;  
    int start_index = 0;

    while (str[index] != '\0')
    {
        if (str[index] == spliter)
        {
            str_num++;
            index++;
            while(str[index] == spliter)
            {
                index++;
            }
        }
        else
       {
            index++;
       }
    }
    str_num++;

    *dst = (char **) malloc((str_num + 1)*sizeof(char*));
    index = 0;

    while (str[index] != '\0')
    {
        if (str[index] != spliter)
        {
            start_index = index;
            each_size = 0;

            while (str[index] != spliter && str[index] != '\0')
            {
                index++;
                each_size++;
            }

            (*dst)[str_index] = (char*) malloc((each_size + 1)*sizeof(char));
            int cur_i = 0;

            while (start_index != index)
            {
                (*dst)[str_index][cur_i] = str[start_index];
                start_index++;
                cur_i++;
            }

            (*dst)[str_index][cur_i] = '\0';
            str_index++;
        }
        else
        {
            index++;
        } 
    }

    (*dst)[str_num] = NULL;
    return str_num;
}
孙维松
  • 47
  • 4
1

Why don't you just use a proper HTML parser? lib2xml has a nice HTML parser in C.

Ed S.
  • 122,712
  • 22
  • 185
  • 265
0
char* split(char *string, char chr, char *output){
    int seek=0;
    for(seek; seek<strlen(string); seek++){
        if( *(string + seek) == chr ){
            break;
        }
    }
    memcpy(output, string  , seek);
    *(output + seek ) = '\0';
    if( (seek + 1) >= strlen(string)){
        return NULL;
    }
    return (string + seek + 1);
}

for use:

char *string = "hello world";

while(1){
    string = split(string, ' ', out);
    if(string == NULL) break;
}

set the cut value in (out) and returns the pointer to continue the string

ali naderi
  • 19
  • 2
0

Try using strstr() and then offsetting the pointer it returns to you.

strstr(big_string_of_tags,"href")+6; //Leaves pointer at the word you're seeking, read up until you see a double quote char.

Its not a very elegant solution but if you're constrained to C alone it might be a good start.

Grambot
  • 4,370
  • 5
  • 28
  • 43
0

You can use a string comparison function like strnstr() to locate substrings, such as begin and end tags. Then you can easily calculate the position and length of the substring you want and use strncpy() to copy that data.

Caleb
  • 124,013
  • 19
  • 183
  • 272