2

I am trying to read the last serial number in a large text file with the code below using Regex. There is two spaces before and after the serial numbers in the text file on at the beginning of each line of text. This takes quite a long time if the file is too big. Is it possible to read the text file from the end of the file to the beginning so that the first capture alone with Match will get me the answer and reduce the time taken in c#. Thanks in advance.

string contents = File.ReadAllText(path);
string pattern = @"(?<=\s{2}\d{1,7}(?=\s{2})";
MatchCollection matches = Regex.Matches(contents, pattern);
string lastmatch = string.Empty;
foreach (Match s in matches)
{
   lastmatch = s.Groups[0].ToString();
}
MessageBox.Show(lastmatch);

The text file looks like.

  1  Blah Blah Blah.  
  2  Ding Dong Bell.  
  3  Hello, how are you.  
  4  My name is Unnikrishnan.  
  5  You are a very good friend.  
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
Unnikrishnan
  • 523
  • 1
  • 8
  • 18
  • 1
    To read your file from the end see this question/answer: http://stackoverflow.com/a/4368915/499581 – l'L'l Oct 24 '15 at 10:34
  • Yes. This answer is correct. My using this suggested answer, even if I read more number of lines say 10 or 15 lines, the answer is delivered in a second. Fantastic answer. This is called the power amazing computer programming. Thanks very much. – Unnikrishnan Oct 24 '15 at 10:47
  • You're welcome! It might help others also if you post the solution that you ended up with as an answer; you can then mark it correct when it allows you. :) – l'L'l Oct 24 '15 at 10:55
  • Do I need to post the tweaked answer based on the already published solution? – Unnikrishnan Oct 24 '15 at 10:58
  • You dont need to post your answer. If that answered your question mark it as duplicate. Your question is not really about matching from end because you already did it. Your question is about reading file from end thus its duplicate. If you can really improve your answer to match from end then post your own better answer – M.kazem Akhgary Oct 24 '15 at 11:27
  • @M.kazem Akhgary I posted my answer because, because Mr. l'L'l told me to post it. Anyway while I thank everyone of you, my special thanks to M.kazem Akhgary for the logic in his comments which I especially liked best. – Unnikrishnan Oct 24 '15 at 12:01

1 Answers1

2

How I have tweaked the answer found at stack overflow for my purpose is like this. The particular text file in my case was 75 MB. There are even larger files which I want to examine. Any file size, I get the answer in a blink of an eye.

public int w { get; set; }

    public void determineSizeOfFile()
    {
        //Not used at present. Designed to count the no. of serial no. of items in the file.
        using (var reader = new StreamReader(fileToProcess)) //Remarkable solution learnt from stack overflow.
        {
            if (reader.BaseStream.Length > 1024)
            {
                reader.BaseStream.Seek(-60000, SeekOrigin.End);
            }
            string line;
            string lastmatch = string.Empty;
            while ((line = reader.ReadLine()) != null)
            {
                string pattern = @"(?<=\s{2})\d{1,7}(?=\s{2})";
                Match match = Regex.Match(line, pattern);
                if (match.Success)
                {
                    lastmatch = match.Value;
                    w = Convert.ToInt32(lastmatch);
                }
            }
        }
    }
Unnikrishnan
  • 523
  • 1
  • 8
  • 18
  • 1
    Your solution checks if the file is larger than 1kB, but seeks 60kB backwards. You should update the `if` clause to match this value. Also, you only need to go backwards enough to match the last line. If your lines are shorter than 60kB, reducing this value will speed up searching. Also note that a different answer in the same thread where you got this [shows how to seek backwards until you find a newline character](http://stackoverflow.com/a/4368913/69809), which is more fail-proof since you don't need to guess the size. – vgru Oct 24 '15 at 11:23
  • The file in fact contains after one serial number several lines of text without serial numbers. Then another serial number and several lines of text. So I did not exactly apply the correct the correct kb to be read. Because this can vary. Moreover I wanted to examine the a chunk of last part of the file file too for other purposes. This is just the modified example instantly done on the spur of the moment when a solution was found on stackoverflow. – Unnikrishnan Oct 24 '15 at 11:28
  • I will also go through the other examples in the page there. – Unnikrishnan Oct 24 '15 at 11:29
  • Reducing -60000 to -1000 will also solve my original problem. Thank you very much. – Unnikrishnan Oct 24 '15 at 11:34