i have the task to implement simple semantic analysis of the text(800MB txt file).For small files everything went quickly. I read this file line by line and those working. The file reading takes 9s. But once you start with the analysis and adding words to the dictionary and storing their positions in the text processing takes too long.
Could you advise me better variation or what would be the better solution to the problem? When dealing with the issue of semantic analysis of the text and the procedure, I will advice for any advice. Thy.
public List<string> SplitWords(string s)
{
s = s.ToLower();
arrayWords = Regex.Split(s, @"\W+");
listWords = arrayWords.OfType<string>().ToList();
for (int i = 0; i < listWords.Count; i++)
{
if (Array.BinarySearch(stopwords, listWords[i]) >= 0 || listWords[i].Length < 2)
{
listWords.RemoveAt(i);
i--;
}
}
return listWords;
}
Code for separating words
public void AddToDictonary(List<string> arrayWords)
{
for (int i = 0; i < arrayWords.Count; i++)
{
if (!dictonary.ContainsKey(arrayWords[i]))
{
dictonary.Add(arrayWords[i], new List<int>() { i });
}
else
{
dictonary[arrayWords[i]].Add(i);
}
}
}
Code for add to the dictionary.