3

Please, help me resolve this issue. I have a huge input.txt. Now it's 465 Mb, but later it will be 1Gb at least.

User enters a term (not a whole word). Using that term I need to find a word that contains it, put it between <strong> tags and save the contents to the output.txt. The term-search should be case insensitive.

This is what I have so far. It works on small texts, but doesn't on bigger ones.

Regex regex = new Regex(" "); 

string text = File.ReadAllText("input.txt"); 
Console.WriteLine("Please, enter a term to search for"); 
string term = Console.ReadLine(); 

string[] w = regex.Split(text); 

for (int i = 0; i < w.Length; i++) 
{ 
    if (Processor.Contains(w[i], term, StringComparison.OrdinalIgnoreCase)) 
    { 
        w[i] = @"<strong>" + w[i] + @"</string>"; 
    } 
} 

string result = null; 
result = string.Join(" ", w); 

File.WriteAllText("output.txt", result);
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
ShHolmes
  • 453
  • 3
  • 10
  • 2
    Possible duplicate of [How can I read, replace and write very large files?](http://stackoverflow.com/questions/10146573/how-can-i-read-replace-and-write-very-large-files) – Nasreddine May 19 '16 at 08:51
  • 3
    _It works on small texts, but doesn't on bigger ones._ Meaning what? Does it crash?? Dont let us guess!!! ' Doesn't work' is __not a helpful__ problem description! – TaW May 19 '16 at 08:53
  • what exactly is the problem with larger files? the size of the `text`variable? – Mong Zhu May 19 '16 at 08:56
  • 1
    If this is a real application it would be the right time to become familiar with databases ;-) – Tim Schmelter May 19 '16 at 09:00
  • OutOfMemoryException with the function File.ReadAllText(); – ShHolmes May 19 '16 at 09:04
  • At a certain point reading the entire file into memory becomes impossible stream is in batches(line by line). – user6144226 May 19 '16 at 09:15

3 Answers3

5

Trying to read the entire file in one go is causing your memory exception. Look into reading the file in stages. The FileStream and BufferedStream classes provide ways of doing this:

https://msdn.microsoft.com/en-us/library/system.io.filestream(v=vs.110).aspx

https://msdn.microsoft.com/en-us/library/system.io.bufferedstream.read(v=vs.110).aspx

Eric Yeoman
  • 1,036
  • 1
  • 14
  • 31
3

Try not to load the entire file into memory, avoid huge GB-size arrays, Strings etc. (you may just not have enough RAM). Can you process the file line by line (i.e. you don't have multiline terms, do you?)? If it's your case then

  ...
  var source = File
    .ReadLines("input.txt") // Notice absence of "All", not ReadAllLines
    .Select(line => line.Split(' ')) // You don't need Regex here, just Split 
    .Select(items => items
      .Select(item => String.Equals(item, term, StringComparison.OrdinalIgnoreCase) 
         ? @"<strong>" + term + @"</strong>" 
         : item))
    .Select(items => String.Join(" ", items));

  File.WriteAllLines("output.txt", source);
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
1

Read the file line by line (or buffer more lines). A bit slower but should work.

Also there can be a problem if all the lines match your term. Consider writing results in a temporary file when you find them and then just rename/move the file to the destination folder.

RokX
  • 334
  • 6
  • 16