I am trying to find all the email address links for all the USA senators, I want to do this by Wget and then doing pattern matching with regex. Then I need to print the matches out to the screen.
This is the problem that I need to have this program do
Part 2: Add to the program a section that loads and parses the downloaded file extracting all of the web addresses. Your program must generate a listing on the screen of the extracted web addresses when it is run.
- Note that the web list must be perfect - no extra characters - to get full credit.
Is there a way to return all matches of the regex from a string (if so, how would I go about doing it)
Unless I am mistaken, I have only found regex functions that would end up finding a single occurrence of the regex and not all of them.
Here is my current regex that I am going to be using
Contact:\s+<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1
This regex seems to work perfectly, for what I am looking for.
This is my current setup for code for the program, just don't know how I would go about implementing the Regex
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
// initiate all used Variables
FILE *file;
char *buffer;
long size;
//Wget on Senate webpage
system("wget -q http://www.senate.gov/general/contact_information/senators_cfm.cfm");
// Attempt to open file
file = fopen("senators_cfm.cfm", "r");
if(file == NULL){
printf("Was unable to open file \n");
return 1;
}
//Attempt to read to end of file
fseek(file, 0L, SEEK_END);
//Determine the number of bytes that were in the file
size = ftell(file);
//Attempt to allocate the number of bytes needed
buffer = (char*) calloc(size, sizeof(char));
if(buffer == NULL){
printf("Unable to allocate memory needed \n");
return 1;
}
//Reset the reader to start of file
rewind(file);
//Read whole file into buffer
fread(buffer, sizeof(char), size, file);
//This is where I will be implementing the regex (or at least the call
//to the function)
//Close file
fclose(file);
//Free all information that we allocated memory for
free(buffer);
unlink("senators_cfm.cfm");
return 0;
}