First the IDE that i am using is the visual C# with .net framework.
Okay so i have about 20,000 html documents with information i need to extract and sort into date order.
The date on the files are stored within this html tag
<td valign="top" class="createdate">
Tuesday, 03 April 2012 20:39
</td>
note: all of the dates are in that format within each html file
I want to extract the date then want to automatically read through each html document and measure the occurrences of a phrase or word.
I am not asking someone to create the entire program for me but if you could provide as much detail on how i could sort through these 20000 html files and extract the date and number of occurrences of a word or phrase and then export that information to a word format or excel i would be very grateful.
Ooh and i am using the data for research for my dissertation, i know how to do string manipulation on well strings and all of the string methods such as finding the occurrence of a word etc.
The problem i am having is how do i get the html data or maybe just the content and then sort them into a usable format. Thank you