valgrind warn you about non initialized data (used in test), considering your program only does sscanf and printf that means you very probably have a problem with your scanf
if I change a little your program to print the result of sscanf, so show much elements it get :
int
main(int argc, char* argv[]) {
char *remain_html = (char *)malloc(sizeof(char) * 1001);
char *url = (char *)malloc(sizeof(char) * 101);
char *html = "<A class=\"mw-jump-link\" HREF=\"#mw-head\">Jump to navigation</a>"
"<a class=\"mw-jump-link\" href=\"#p-search\">Jump to search</a>";
printf("html: %s\n\n", html);
printf("%d\n", sscanf(html, "<a href=\"%s", remain_html));
printf("after first href tag: %s\n\n", remain_html);
printf("%d\n", sscanf(remain_html, "%s\">", url));
printf("first web: %s\n\n", url);
printf("%d\n", sscanf(remain_html, "<a href=\"%s", remain_html));
printf("after second href tag: %s\n\n", remain_html);
free(remain_html);
free(url);
}
the execution is :
pi@raspberrypi:/tmp $ ./a.out
html: <A class="mw-jump-link" HREF="#mw-head">Jump to navigation</a><a class="mw-jump-link" href="#p-search">Jump to search</a>
0
after first href tag:
-1
first web:
-1
after second href tag:
pi@raspberrypi:/tmp $
so the first scanf got nothing (0 element), that means it does not set remain_html and that one is non initialized when it is used by the next sscanf with an undefined behavior
Because of the format
"<a href=\"%s"
the first sscanf waits for a string starting by
<a href="
but html starts by
<A class=
which is different, so it stop from the second character and does not set remain_html
To use sscanf is not the right way, search for the prefix <a href=" may be in uppercase for instance using strcasestr, then extract the URL up to the closing "
Example :
#include <stdio.h>
#include <string.h>
#include <ctype.h>
/* in case you do not have that function */
char * strcasestr(char * haystack, char *needle)
{
while (*haystack) {
char * ha = haystack;
char * ne = needle;
while (tolower(*ha) == tolower(*ne)) {
if (!*++ne)
return haystack;
ha += 1;
}
haystack += 1;
}
return NULL;
}
int main(int argc, char* argv[]) {
char *html = "<A HREF=\"http://www.google.com\">navigation</a>"
"<a href=\"/a.html\">search</a>";
char * begin = html;
char * end;
printf("html: %s\n", html);
while ((begin = strcasestr(begin, "<a href=\"")) != NULL) {
begin += 9; /* bypass the header */
end = strchr(begin, '"');
if (end != NULL) {
printf("found '%.*s'\n", (int) (end - begin), begin);
begin = end + 1;
}
else {
puts("invalid url");
return -1;
}
}
}
Compilation and execution :
pi@raspberrypi:/tmp $ gcc -Wall a.c
pi@raspberrypi:/tmp $ ./a.out
html: <A HREF="http://www.google.com">navigation</a><a href="/a.html">search</a>
found 'http://www.google.com'
found '/a.html'
pi@raspberrypi:/tmp $
Note I know the second parameter of strcasestr is in lower case so it is useless to do do tolower(*ne) and *ne is enough, but I given a definition of the function out of the current context