1

I have a few legacy websites and each has a LOT Of static HTML pages. I would like to use IIS module to capture the generated page content and add additional HTML snippets to make it have new header and footer (this is called the decorator pattern). Here is the code I have for the module. The odd thing is that in many tests, I notice that the module is invoked TWICE when a page is loaded and each invocation passes part of the content of the page to the module (the first invocation passes the top portion of the page and the second the remaining portion of a page). The reason I know the module is invoked twice is because I used a static variable to capture the number of invocation and show it in the new header and footer (the two numbers are different and the footer number is always 1 larger the header number). I was also able to export page content into two different files to prove it.

namespace MyProject
{
    public class MyModule : IHttpModule
    {
        public void Dispose()
        {
        }

        public void Init(HttpApplication application)
        {
            application.ReleaseRequestState += new EventHandler(this.My_Wrapper);
        }

        public String ModuleName
        {
            get { return "MyProject"; }
        }

        public void My_Wrapper(Object source, EventArgs e)
        {
            HttpApplication app = (HttpApplication)source;
            HttpContext context = app.Context;
            HttpRequest request = context.Request;
            string requestPath = request.Path.ToString();

            //I have guarding code here so that the following code only applies to 
            //web requests that has ".html" in the end.

            HttpContext.Current.Response.Filter = new WrapperFilter(HttpContext.Current.Response.Filter);
        }
    }

    public class WrapperFilter : MemoryStream
    {
        private static Regex startOfBody = new Regex("(?i)<body(([^>])*)>", RegexOptions.Compiled | RegexOptions.Multiline);
        private static Regex endOfBody = new Regex("(?i)</body>", RegexOptions.Compiled | RegexOptions.Multiline);

        private Stream outputStream = null;

        private static int index = 0;

        public WrapperFilter(Stream output)
        {
            outputStream = output;
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            string contentInBuffer = UTF8Encoding.UTF8.GetString(buffer);
            string page = new StringBuilder(contentInBuffer).ToString();
            byte[] outputBuffer = null;
            Match matchStartOfBody = null;
            Match matchEndOfBody = null;

            index++;

            matchStartOfBody = startOfBody.Match(page);
            string header = "html snippets for header: " + index;
            page = startOfBody.Replace(page, "<body " + matchStartOfBody.Groups[1] + ">" + header);

            matchEndOfBody = endOfBody.Match(page); 
            string footer = "html snippets for footer: " + index;
            page = endOfBody.Replace(page, footer + "</body>");

            outputBuffer = UTF8Encoding.UTF8.GetBytes(page);
            outputStream.Write(outputBuffer, 0, outputBuffer.Length);
        }
    }
}

Question:

  1. The reason that the module is loaded twice is because the page content is too large or I need to increase the cache? If so, how?

  2. Technically, is my approach going to work? I was able to decorate HTML pages and because of the two invocations process, I am unable to handle some advanced situations.

  3. When an image needs displayed in a browser page, and the request for the image goes through IIS modules ?

UPDATE

Based on the valuable input from usr, the "odd" behavior is just IIS's normal behavior. Because of his/her suggestion, I added a class variable:

private byte[] allContent = new byte[0];

and the following updated method:

    public override void Write(byte[] buffer, int offset, int count)
    {
        //new bigger array
        byte[] newArr = new byte[allContent.Length + buffer.Length];
        //copy old content
        System.Array.Copy(allContent, newArr, allContent.Length);
        //append new content
        System.Array.Copy(buffer, 0, newArr, allContent.Length, buffer.Length);
        //reset current total content
        allContent = newArr;
    }

and add a new method with all the code copied from my earlier Write method:

    protected override void Dispose(bool disposing)
    {
    //code copied from my earlier code, with "buffer" changed to "allContent".
    }

Now everything works! Thank you, usr!!!

curious1
  • 14,155
  • 37
  • 130
  • 231
  • I don't know what the main bug would be. I advise you to drop the filter approach and instead modify the app to directly emit the right HTML. This generic will never work completely reliably. – usr Mar 22 '16 at 14:49
  • *modify the app to directly emit the right HTML*, this is not possible at this moment. *This generic will never work completely reliably* This is a very sweeping statement. Why say so? – curious1 Mar 22 '16 at 14:56
  • 1
    Anywhere the ` – usr Mar 22 '16 at 14:58
  • Thanks for your thought. Actually, I have guarding code in actual implementation to only decorate requests with ".html" in the end. – curious1 Mar 22 '16 at 16:04
  • 1
    Let's experiment: Replace `My_Wrapper` with `Debug.WriteLine(context.GetHashCode())`. Still two calls per request? Let's narrow the problem down. – usr Mar 22 '16 at 17:32
  • Thanks for asking the question about whether the page size matters. I did tests again. It does. For small pages, I see the same number in header and footer. For large pages, I see 3 and 4 or something like that. I experimented with output catching config, which seems not working. – curious1 Mar 22 '16 at 18:16
  • 1
    No, this is not a framework bug or config issue. This is a bug in your code. We must find it. `because I used a static variable to capture the number of invocation and show it in the new header and footer` Can you post the code that does this? I did not read that sentence before to be honest. My spider sense is tingling now.... – usr Mar 22 '16 at 18:18
  • Thanks for asking about the static variable. I updated the code with the static variable "index". The content passed to the module in two invocations contains the and respectively. – curious1 Mar 22 '16 at 18:24

1 Answers1

1

OK, I should have solved this earlier. I admit I did not read every sentence of the question. I should have grown suspicious of the measurement. Turns out the measurement is broken.

Thanks for asking the question about whether the page size matters. I did tests again. It does. For small pages, I see the same number in header and footer. For large pages, I see 3 and 4 or something like that.

Then:

    public override void Write(byte[] buffer, int offset, int count)
    {
        //...

        index++;

Write might be called an arbitrary number of times. This is a Stream implementation. Anyone can call Stream.Write as often as he wants to. You would expect that with any Stream.

The index can be incremented many times per page. The counting code is broken, the rest works.

Also, the UTF-8 processing is broken because you can't split UTF-8 encoded data at arbitrary boundaries.

usr
  • 168,620
  • 35
  • 240
  • 369
  • `Anyone can call Stream.Write as often as he wants to.` You the write implementation of MY module can be called many times during the IIS serving a page? It is not a problem itself. The problem is that the content passed to the Write is NOT a COMPLETE page and I hope to add footer content when the header content is added. So I need to test whether the header is added, which is not possible when the bottom portion is passed in the second invocation. – curious1 Mar 22 '16 at 18:32
  • `The counting code is broken`. I dont understand this. Could you please elaborate? `the UTF-8 processing is broken because you can't split UTF-8 encoded data at arbitrary boundaries.` Could you post the correct way? Thanks a lot for posting this answer. – curious1 Mar 22 '16 at 18:33
  • 1
    Well, you are implementing `Stream`. This means that callers treat your class like a stream. They can post the bytes they want to write in as many chunks as they want. They could feed you one byte at a time. You must deal with that. You could, for example, buffer all content in a `MemoryStream` and only when you are being disposed you decode, modify, encode and write to `outputStream`. Count in Dispose, that fixes the counting. The UTF8 thing is fixed by decoding everything at once. – usr Mar 22 '16 at 18:43
  • 1
    `Encoding.UTF8.GetBytes("ö")` gives you two bytes. But someone might write them one by one into your stream. You will attempt to decode the first byte which is invalid because the second byte is missing. That's the UTF8 bug here. – usr Mar 22 '16 at 18:44
  • usr, I got it working after research. I updated my code. If you see any error, PLEASE let me know. Thanks for the Decompose idea. You made my day!!! Thanks!!! – curious1 Mar 22 '16 at 19:29
  • am I right that each image request also goes through all the IIS modules? Thanks! – curious1 Mar 22 '16 at 19:33
  • Looks good, can be extremely inefficient (the resizing might be O(n^2)). I'd use MemoryStream. You seem surprised that IIS might write byte by byte. Yet, this is the same thing *you* would do with any stream (such as MemoryStream or FileStream). That is normal. – usr Mar 22 '16 at 19:45
  • Thanks for your confirmation! `can be extremely inefficient (the resizing might be O(n^2)). I would use MemoryStream` Could you please show code sample? Dont know how to do this. I come from the Java world. Anther thing: am I right that each image request also goes through all the IIS modules? – curious1 Mar 22 '16 at 19:49
  • I have no idea about the images. This is actually a complicated issue on IIS. Must ask anew or Google.; Write into a MemoryStream and at the end obtain its contents using GetBuffer or ToArray. If you can't figure it out, Google or ask anew.; This was a fun question. I'm normally quicker when debugging impossible problems :) It is *very* common that asking users make false assumptions or misidentify the problem area ;-) – usr Mar 22 '16 at 19:51
  • 1
    Thanks for your follow-up! Best. – curious1 Mar 22 '16 at 20:03