0

I'm using the regex.h library for my C program.

I need to download all files whose link is stored in tag in html data. So my first task is to extract its contents of "href" property.

I use this address to pactice http://students.iitk.ac.in/programmingclub/course/lectures/

In its html content, there are many tag like

<a href="1.%20Introduction%20to%20C%20language%20and%20Linux.pdf">
<a href="1.%20Introduction%20to%20C%20language%20and%20Linux.ppt">
<a href="1.%20Introduction%20to%20C%20language%20and%20Linux.pptx">
...

I write a regex string to extract the content in "href" property

char regex[] = "href=\"([a-zA-Z0-9%.,]*\\.[a-zA-Z0-9]*{1,4})\"";

What I expect for the regex (I can handle full match and group match myself).

1.%20Introduction%20to%20C%20language%20and%20Linux.pdf
1.%20Introduction%20to%20C%20language%20and%20Linux.ppt
1.%20Introduction%20to%20C%20language%20and%20Linux.pptx
...

What I receive is only the first link (I just care about group match).

1.%20Introduction%20to%20C%20language%20and%20Linux.pdf

Nice day and thank you very much.

ps: I use REG_EXTENDED for regcomp()

Steve Buzonas
  • 5,300
  • 1
  • 33
  • 55

0 Answers0