0

I am trying to find all the email address links for all the USA senators, I want to do this by Wget and then doing pattern matching with regex. Then I need to print the matches out to the screen.

This is the problem that I need to have this program do

Part 2: Add to the program a section that loads and parses the downloaded file extracting all of the web addresses. Your program must generate a listing on the screen of the extracted web addresses when it is run.

  • Note that the web list must be perfect - no extra characters - to get full credit.

Is there a way to return all matches of the regex from a string (if so, how would I go about doing it)

Unless I am mistaken, I have only found regex functions that would end up finding a single occurrence of the regex and not all of them.

Here is my current regex that I am going to be using

Contact:\s+<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1

This regex seems to work perfectly, for what I am looking for.

This is my current setup for code for the program, just don't know how I would go about implementing the Regex

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>

int main() {


    // initiate all used Variables
    FILE *file;
    char *buffer;
    long size;

    //Wget on Senate webpage
    system("wget -q http://www.senate.gov/general/contact_information/senators_cfm.cfm");


    // Attempt to open file
    file = fopen("senators_cfm.cfm", "r");

    if(file == NULL){

        printf("Was unable to open file \n");
        return 1;        

    }

    //Attempt to read to end of file
    fseek(file, 0L, SEEK_END);



    //Determine the number of bytes that were in the file
    size = ftell(file);

    //Attempt to allocate the number of bytes needed
    buffer = (char*) calloc(size, sizeof(char));    
    if(buffer == NULL){

        printf("Unable to allocate memory needed \n");
        return 1;
    }


    //Reset the reader to start of file
    rewind(file);


    //Read whole file into buffer
    fread(buffer, sizeof(char), size, file);

    //This is where I will be implementing the regex (or at least the call
    //to the function)


    //Close file
    fclose(file);


    //Free all information that we allocated memory for
    free(buffer);

    unlink("senators_cfm.cfm");
    return 0;
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • there's `std::regex*` in C++ but there's nothing like that in C. Implementing a regex engine is a huge task, so you may better off parse the string and check – phuclv Sep 09 '19 at 03:50
  • well I know there are libraries that do it for C, just don't know which ones would do what I want if there is such a library for matching all. – Jeffrey Hennen Sep 09 '19 at 03:52
  • 2
    in case you want a library suggestion then it's off-topic here. Try [softwarerecs.se] instead – phuclv Sep 09 '19 at 03:53
  • You could use POSIX [``](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html), except that it doesn't support notations such as `\s` (you'd have to use the verbose but equivalent `[[:space:]]` notation, but there are other problematic PCRE-style notations), or you could use [PCRE](https://pcre.org/), which does support `\s`, or you can find other regex packages by searching. – Jonathan Leffler Sep 09 '19 at 04:38
  • @JonathanLeffler So, if I end up doing the verbose equivalent notation. This would still not solve the issue of me wanting to grab all the matches correct? If there is a way to do that or do something similar I would love to know. – Jeffrey Hennen Sep 09 '19 at 19:03
  • Correct. The POSIX regular expression code won’t handle the non-capturing grouping: `(?:…)` either. But you could live with just `(…)` I think, and ignore the capture. The non-greedy `*?` qualifier is much more difficult to deal with. You need the PCRE library because you’re using Perl-compatible regular expression notation. Anything else is going to be hard, unless you find a PCRE-compatible regex library. – Jonathan Leffler Sep 09 '19 at 19:10
  • See also [How to use PCRED in C for multiple matches?](https://stackoverflow.com/questions/57859449/how-to-use-pcre-in-c-for-multiple-matches) — a development of this question (neither is a duplicate of the other). – Jonathan Leffler Sep 09 '19 at 20:56

0 Answers0