I'm currently building a Instapaper clone and need some help designing the algorithm.
It has two components:
- Extract the main text block from an HTML document
- If the save article has more than 1 page then extract text from all pages
Can you guys point me to the right direction? I will be using .NET 4 C# for this project.