I have some log files that is written by log4cpp format
--By the nature of log4cpp, this file is sorted by the datetime at the beginning of each line
Assuming the format is like
2012-09-02 17:17:36.891 This is line 1 in file 2
...
2013-08-05 14:17:35.344 This is line 607082 in file 2
2013-08-05 14:17:36.891 This is line 607083 in file 2
...
2013-09-05 14:27:36.891 This is line 934594 in file 2
Now I am writing a program to parse these files and try to quickly locate a line.
For example, if I run
./my_program -start_time "2013-08-05 14:17:36" file_2.txt
I am expecting this program can return 607083 as a result.
Also, the -start_time can be based upon other granularity like "2013-08-05 14:17:35.899" or "2013-08-15" But I am expecting the nearest result.
I can traverse this file line by line, and compare the timestamp at the beginning of each line(just use string comparison) , but it will take O(N) time. I already implemented that and found it's really slow if there are millions of lines at the beginning to skip.
I am wondering if we can use Binary Search for this. I think it is the best way to return the nearest result and only takes O(lgN) time