I'm currently creating an app that will iterate through a number of URLs, it pulls down the source code then extracts specific data using reference points like element ids etc.
The source code is loaded into a String object then processed by finding the IndexOf the reference point and performing a SubString.
The problem is the String object is generation 2 in Garbage Collection, which means it sticks around in memory for a while before being collected. Meaning after accessing more and more URLs the memory usage of the app continues to grow.
I have ran the app and processed 25 URLs, the memory usage jumped to 300Mb and after a while - I assume after garbage collection has fired - the memory usage fell back down to 1Mb.
So since I only need the data for a short amount of time, to extract the data, is there a more optimised way of doing this?
Note I can't read the source in chunks as separation could occur part way through a reference point.
I.E.
...<a href="http://www.some-website.com/" id="link-I-need">Hyperlink</a>...
could be separated as such
...<a href="http://www.some-website.com/" id="link-] (End of first chunk) - (Start of second chunk) [I-need">Hyperlink</a>...