3

I have a text file containing thousands of lines. What would be the optimal way to find if a certain string exists in the file or not?

Either by reading the whole file into a string & then using string.contains method or by creating a list of all the lines using Files.readAllLines method & then looping through each line from the list & check whether that line contains the required string or not?

Update: I am using Java 7. The search is limited to 1-2 string searches per file(10 files).The string to be searched changes with the file. I want to stop the search if the string is found.

mayur2j
  • 141
  • 2
  • 13
  • 1
    look up substring searching algorithms like [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm) and [Aho-Cosarick](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) – Courage May 08 '17 at 05:03
  • It's hard to answer your question without knowing how often this is going to happen. i.e. is it a once-off search? is the search going to happen often but the input string to find changes often? is the search going to happen often but the input file changes often? – Catchwa May 08 '17 at 05:11
  • What is your requirement exactly? The moment string is found in any of the line, do you want to stop? Or do you want to print all occurrences? – akhil_mittal May 08 '17 at 05:13

3 Answers3

6

Considering the case you are using Java 8 and file is huge in size it is better to make use of Streams API. There can be two cases: one is the moment you locate the line containing stringToSearchyou want to return or you want to explore all the lines looking for the stringToSearch. The sample code will be like:

String fileName = "c://SomeFile.txt";
String stringToSearch = "dummy";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
     // Find first
     Optional<String> lineHavingTarget = stream.filter(l -> l.contains(stringToSearch)).findFirst();
     // search all
     stream.filter(l -> l.contains(stringToSearch)).forEach(System.out::println);
     // do whatever
    } catch (IOException e) {
         // log exception
    }

So reading all the lines of a file seems a bad idea. Its better to read it line-by-line. If you are interested in learning about fastest string search alogrithm then check this link.

Community
  • 1
  • 1
akhil_mittal
  • 23,309
  • 7
  • 96
  • 95
5

Since the file is containing a lot of lines, it will be better idea to read that file line-by-line, instead of getting all of it's content into program memory. So essentially, read one line check for presence of your string and move forward.

Yash Soni
  • 456
  • 3
  • 11
0

There is little benefit to keeping the lines in a list. Both methods you propose do suffer from the same caveat, though.

If you only care about the specific lines in the file, you probably don't want to keep unwanted lines in memory. If you are using Java 8, can use Files.lines() to read files on a line-by-line basis with a stream. Otherwise, guava's LineProcessor, can do this as well.

This example uses streams to find all lines which match a string and return them in a list.

List<String> lines = Files.lines(path)
            // findFirst() can be used get get the first match and stop.
            .filter(line -> line.contains("foo"))
            .collect(Collectors.toList());

This one does it using guava.

import com.google.common.io.Files;
import com.google.common.io.LineProcessor;

List<String> lines = Files.readLines(file, new LineProcessor<List<String>>() {

    private List<String> lines = new ArrayList<>();

    @Override
    public boolean processLine(String line) throws IOException {
        if (line.contains("foo"))
            lines.add(line);
        return true; // return false to stop
    }

    @Override
    public List<String> getResult() {
        return lines;
    }

});
killjoy
  • 3,665
  • 1
  • 19
  • 16