0

I'm trying to check if large text document about 500 000 lines contains specific line, and problem is if I find it this way:

string searchLine = "line 4";

using (StreamReader sr = new StreamReader(filePath)) 
{
   string contents = sr.ReadToEnd();
   if (contents.Contains(searchLine))
   {
      Console.WriteLine("line exist");
   }
   else
   {
      Console.WriteLine("line does not exist");
   }
}

and document content is and I do not accept writing duplicates to it, all string are unique:

line 1
line 2
line 3
line 4
line 5
line 47

So I got answer that's "line exist" for "line 4" right, but then if I remove it from the order, and check file for same string "line 4" again, it says that the "line exist", because seems like it founds all 4 numbers in text file content, and only if I remove "line47", then "line does not exist".

So I'm wondering how to find specific line with unique string content in large text document.

  • Are you sure it was **line 74** misleading? I believe not. Maybe you can try find with notepad++/ultraedit. – Lei Yang Nov 02 '16 at 01:42
  • You can include the `Environment.NewLine` in your `searchLine`. And as Lei Yang said, **line 74** should it be **line 47**? – Prisoner Nov 02 '16 at 01:45
  • Contains will not handle the case that line have all the characters you are searching for, and more character following that. So, yeah, line 47 like @Alex said will return true. – Paul L Nov 02 '16 at 02:00
  • @Lei Yang yes line 47 edited –  Nov 02 '16 at 02:59

2 Answers2

1

sr.ReadToEnd(); does not read the file line by line but reads all characters from the current position to the end of the stream.

While the Readline() method reads a line of characters from the current stream and returns the data as a string

The Readline() method will read the file line by line like so:

string currentLine;
bool exist = false;

using (StreamReader sr = new StreamReader(filepath))
{
    while ((currentLine = sr.ReadLine()) != null)
    {
        if (currentLine == "line 4")
            exist = true;                       
    }
 }

 Console.WriteLine(exist ? "line exist" : "line does not exist");

Alternatively you can also compare with:

string.Equals(currentLine, "line 4")

instead of

currentLine == "line 4"
Jim
  • 2,974
  • 2
  • 19
  • 29
  • I would suggest `String.equals` as well, if all you want is line by line exact match, and before you ask the time take to read 500000 lines is negligible. – Paul L Nov 02 '16 at 02:42
  • @Jim yes that what I'm looking for, works great, and yes return should be bool, to know exactly about single string exist or not, previous solution also was good, but it returns from all string asked and not asked lines –  Nov 02 '16 at 02:44
  • your welcome Tim. *(added it to the post Paul, thanks)* – Jim Nov 02 '16 at 02:52
  • @Jim I'm not really get what the difference between `if (currentLine == "line 4")` and `string.Equals(currentLine, "line 4")`? –  Nov 02 '16 at 02:57
  • 1
    @TimR you can read documentation about it [here](http://stackoverflow.com/questions/1659097/why-would-you-use-string-equals-over). For a short answer *(The '==' operator compares object references (shallow comparison) whereas .Equals() compares object content (deep comparison).)* – Jim Nov 02 '16 at 03:04
  • 1
    @TimR don't worry about it to much if you are only comparing strings, both have same outcome here. – Jim Nov 02 '16 at 03:12
0

You can use the following code to search for exact content.

public string ExactReplace(string input, string find, string replace)
{
    string textToFind = string.Format(@"\b{0}\b", find);
    return Regex.Replace(input, textToFind, replace);
}

and then you can call it like

string fulltext = sr.ReadToEnd();
string result = text.ExactReplace(fulltext, "line 4", "");

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

For more on Word Boundaries

Mohit S
  • 13,723
  • 6
  • 34
  • 69
  • Solution by Jim above does exactly what I've asked in question, but seems like this solution is useful too –  Nov 02 '16 at 03:03