5

My question is unfortunately badly formed as I'm not entirely certain to call what I'm trying to do. My apologies for that.

It came up since I'm trying to code a really basic browser that I want to implement in C and I was thinking about how best to go about it. The basic idea being something like libcurl (for network interaction) -> libxml2 (parse HTML) -> UI and then some manner of getting libcurl to accept GET or POST requests from the UI (haven't gotten to this point yet).

However, this approuch is severely limited, if I say want to check whether it's a PDF and then send it off to libpoppler before handing it to libxml2 I'll have to recode my entire program flow. Further, if I want to use parts of my program (say, the libcurl -> pdftohtml -> libxml2 part) and send it off to another program (for example w3m instead of my UI), I again don't see how I will manage that.

I could instead simply write a Perl or Python wrapper for curl, libxml2, etc, or do something along the lines of "curl example.com | parser | UI". However doing it in Perl or Python still seems like I'll have to recode my program logic every time I want to do something new, and piping everything seems inelegant. I would also like to do this in C if possible.

So my question is; what do one call this idea? I've been driving myself crazy trying to figure out how to search for a solution for a problem that I can't name. I know it has something to do with modularity, however I don't know what specifically and modularity is a very broad term. Secondly and optionally if anybody could point me in the direction of a solution I would appreciate that as well although it's not as important as what it's called.

Thanks to all who read this. :)

Ellen
  • 51
  • 1
  • 2
    It's probably worth noting that an XML parser (libxml2) *cannot* parse HTML directly. You would have to first run the HTML through a step that transformed it into syntactically correct XML, which would be challenging (and then you would have written an HTML parser anyway). – Greg Hewgill May 14 '15 at 21:21
  • 1
    I don't really understand what you mean. You send a request and then you recieve a response, when you do then you check what it is and call the appropriate function to do what you want -> (Render HTML, Open a PDF file ... etc.), why would you need to **recode** the whole thing? Your idea of that doesn't make sense to me. – Iharob Al Asimi May 14 '15 at 21:24
  • @GregHewgill AFAIK `libxml2` has a html parser interface. – Iharob Al Asimi May 14 '15 at 21:25
  • @iharob: Interesting, I didn't know that. Might indeed be useful then :) – Greg Hewgill May 14 '15 at 21:38
  • Yes, libxml2 does have a parser interface. – Ellen May 14 '15 at 21:47
  • I'll try to be a bit more clear, it seems it's actually multiple questions. This post http://stackoverflow.com/questions/29002253/how-do-i-modular-design-in-c partially answered my question, I can simply dynamically pick the modules I want to use and load them at runtime However, how to I dynamically alter the program logic for the main program at run time? For example, I decide that I only want to load the PDF to HTML conversion module and save the results to a file. This dramatically alters the main program logic. Lastly, how would I make the modules capable of acting as standalone programs? – Ellen May 14 '15 at 21:54
  • You will find that it is very difficult to design/code/test a project if you, first, have not detailed exactily what the project is expected to perform, and under what conditions it is expected to work. – user3629249 May 14 '15 at 23:32

1 Answers1

4

First I suggest you take a look at http://www.amazon.com/Interfaces-Implementations-Techniques-Creating-Reusable/dp/0201498413. Second most browsers are asynchronous so you are going to need a event library like libuv or libev. Also most modern websites require javascript to function properly, but adding a javascript engine to your browser would greatly complicate the project. I also don't see any mention of how you plan on parsing the http being sent to and from your browser, I suggest https://github.com/joyent/http-parser.

As for your question on control flow, I would have a function that parse's the response from the server and use's switch() to handle the various types of data being sent to your browser. There is a field in the http header which explains the content type and that way your browser should be able to call different functions based of what the content type is.

Also take a look at function pointers, both here Polymorphism (in C) and here How do function pointers in C work? . Function pointers would/could be a more eloquent way to solve your problem instead having large switch statements through out your code. With function pointers you can have one function that when called in your program behaves differently.

I will try to explain below with a browser as an example.

So lets say your browser just got back a http response from some server. The http response looks something like this in C.

struct http_res
{
    struct http_header *header;
    struct http_body *body

    int (*decode_body)(char **);
};

So first your http parser will parse the http header and figure out if it's a valid response and if there's content, etc, etc. If there is content the parser will check the type and based off, if it's html, javascript, css, or whatever the parser will set the function pointer to point at the right function to decode the http body.

static int decode_javascript(char **body)
{
    /* Whatever it takes to parse javascript from http. */
    return 0;
}

static int decode_html(char **body)
{
    /* Whatever it takes to parse html from http. */
    return 0;
}

static int decode_css(char **body)
{
    /* Whatever it takes to parse css from http. */
    return 0;
}

int parse_http_header(struct http_res *http)
{
    /* ... lots of other code to figure out content type. ... */

    switch(body_content_type)
    {
        case BCT_JAVASCRIPT:
          http->decode_body = &decode_javascript;
          break;

        case BCT_HTML:
          http->decode_body = &decode_html;
          break;

        case BCT_CSS:
          http->decode_body = &decode_css;
          break;

        default:
          printf("Error can't parse body type.\n");
          return -1;
    }
    return 0;
}

Now when we pass the http request to another part of the browser that function can call decode_body() in the http response object and it will end up with a decoded body it can understand, with out knowing what it's decoding.

int next_function(struct http_res * res)
{
    char *decoded_body;
    int rtrn;

    /* Now we can decode the http body with out knowing anything about
    it. We just call decode_body() and end up with a buffer with the
    decoded data in it. */
    rtrn = res->decode_body(&decoded_body);
    if(rtrn < 0)
    {
        printf("Can't decode body.\n");
        return -1;
    }

    return 0;
}

To make your program really modular at least in C, you would stick the various parts of your browser in different shared libraries, like the HTTP parser, event library, Javascript engine, html parser, etc, etc. Then you would create interfaces between each library and you would be able to swap out each library with a different one with having to change your program, you would link a different library at run time. Take a look at Dr Robert martin(uncle bob) he talks about this extensively. This talk is good but it lacks slides https://www.youtube.com/watch?v=asLUTiJJqdE , starts at 8:20. This one is also interesting, and it has slides: https://www.youtube.com/watch?v=WpkDN78P884 .

And finally nothing about C, perl or python means you will have to recode your program logic. You will have to design your program so that each module does not know about each other. The module knows about the interface and if you connect two modules that both "speak" the same interface you will have created a modular system. It's just like how the internet works the various computers on the internet do not need to know what the other computer is, or what it's doing, or it's operating system, all they need to know is TCP/IP and they can communicate with all the other devices on the internet.

Community
  • 1
  • 1
2trill2spill
  • 1,333
  • 2
  • 20
  • 41
  • Thank you, I will have a look at what you suggested as it seems to be in the right direction towards what I am trying to do. The fact that I'm coding a browser is somewhat arbitrary, I'm more interested in using it to learn how to write modular programs. Would you mind expanding on your last two paragraphs a little more as I'm not certain what you mean? – Ellen May 15 '15 at 15:35
  • Yea no problem, and don't expect to read it all in one sitting. I only read a handful of pages a day. Also Take a look at 21st century C, the interfaces book is written in a older C style. – 2trill2spill May 15 '15 at 15:41
  • Thanks a ton! I think this is exactly what I was trying to figure out! :) I'm off to try and implement this so that I can understand it. ^_^ – Ellen May 15 '15 at 17:21