I have a .NET windows service which takes HTML content and generates Word 2007 files out of them. Now, the HTML content is cleaned up (empty tags are removed etc) via a recursive function before it is converted to a Word 2007 document. However, there are some big HTML content which create "out of memory" exception because of the recursive function. I put a retry counter on the method so that the function is not called more than the counter number of times. However, that resulted in many HTML files not getting converted or getting converted to bad Word 2007 contents.
If I try to divide the HTML source code to process, It might complicate things as each HTML structure is different and splitting content would probably lead to change the clean up code.
Need some suggestions on how to handle this problem.
Any help would be very much appreciated.