2

I'm attempting to write a Concordance program in Java using JDK 8 (to relearn the language). So far, I have the following (package name omitted):

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.Map;

public class Main {

    public static Map<String,Integer> generateConcordance(String fileName) throws IOException {
        Map<String,Integer> concordance = new HashMap<>();

        for (String line : Files.readAllLines(Paths.get(fileName))) {
            for (String word : line.split("[\\p{Punct}\\s]+")) {
                concordance.merge(word, 1, Integer::sum);
            }
        }

        return concordance;
    }

    public static void main(String[] args) {
        if (args.length == 0) {
            System.err.println("invalid operation - you must specify an input file");
        } else {
            try {
                Map<String, Integer> concordance = generateConcordance(args[0]);
                concordance.forEach((key,value) -> System.out.println(key + "\t" + value));
            } catch (IOException error) {
                System.err.println("I/O error - unable to read the file: " + args[0]);
            }
        }

    }

}

This works, but I'd like to generalize the implementation to be able to read input from System.in if no arguments are passed. Is it possible to modify the input to the function generateConcordance to allow it to be applied to either a named file or System.in?

  • Why not overload generateConcordance() with no params and give it the default behavior with System.in? – Julien Leray Jun 02 '15 at 03:16
  • btw, about optional params in Java: http://stackoverflow.com/questions/965690/java-optional-parameters – Julien Leray Jun 02 '15 at 03:17
  • I'd like my implementation to be as elaboration tolerant as possible. The algorithm behind `generateConcordance` ought not change if I'm reading a user specified file or `System.in` (which is basically the analogue to `stdin` in C). –  Jun 02 '15 at 03:21
  • yes so... I don't understand what is wrong, usually you would use optional parameters no? – Julien Leray Jun 02 '15 at 03:26
  • I guess I don't quite understand your suggestion. What I'm basically attempting to do is to see if I can generalize the outer for loop: `for (String line : Files.readAllLines(Paths.get(fileName)))` to operate on either `System.in` or `fileName`. Reading the documentation I don't see a straightforward way of doing this. –  Jun 02 '15 at 03:35
  • You asked how to generalize the implementation, of the function, no the loop. So if you want to change only the loop, I don't think it's possible. It's why I sugested you to overload your method or use a trick for emulate optional parameter on your function. Finaaly you will be able to call your method with or without parameter and have the good behavior. – Julien Leray Jun 02 '15 at 03:43

1 Answers1

1

With a little bit of refactoring, and using Stream<String> this should be pretty straightforward. You could try the following (untested):

public class Main {

  private static final Pattern wordBreak = Pattern.compile("[\\p{Punct}\\s]+");

  public static Map<String, Long> generateConcordance(Stream<String> lines) {
    return lines
      .flatMap(wordBreak::splitAsStream)
      .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
  }

  public static Map<String, Long> generateConcordance(String fileName) throws IOException {
    try (Stream<String> lines = Files.lines(Paths.get(fileName))) {
      return generateConcordance(lines);
    }
  }

  public static Map<String, Long> generateConcordance(InputStream in) {
    InputStreamReader reader = new InputStreamReader(in, StandardCharsets.UTF_8);
    return generateConcordance(new BufferedReader(reader).lines());
  }

  public static void main(String[] args) {
    try
    {
      generateConcordance("SomeRandomFile.txt");
      generateConcordance(System.in);
    }
    catch (IOException e)
    {
      e.printStackTrace(System.err);
    }
  }
}
clstrfsck
  • 14,715
  • 4
  • 44
  • 59
  • exactly what I would do. – Julien Leray Jun 02 '15 at 03:51
  • 1
    You can also do `lines.flatMap(Pattern.compile("[\\p{Punct}\\s]+")::splitAsStream)...` to compile the regex just once. Or store the compiled pattern in a constant and then `lines.flatMap(wordBreak::splitAsStream)...` – Misha Jun 02 '15 at 04:03
  • 1
    It's good to close your IO streams. Currently they are not closed automatically. – Tagir Valeev Jun 02 '15 at 07:40
  • I don't think you would want to close the `InputStream` in the `generateConcordance(...)` method. To me it would seem more logical to do this in the caller, where it was opened, probably using try-with-resources as an ownership thing in case the caller wanted to rewind a `FileInputStream` (for example) and do something else with it. Unfortunately closing the wrappers (`InputStreamReader` and `BufferedReader`) closes the underlying stream too. – clstrfsck Jun 02 '15 at 07:55
  • 1
    @msandiford: I agree about `generateConcordance(InputStream in)`, but it's created and not closed in `Map generateConcordance(String fileName)` method. See [Files.lines](https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#lines-java.nio.file.Path-java.nio.charset.Charset-) documentation. – Tagir Valeev Jun 02 '15 at 08:44
  • Generally, I appreciate structuring code but splitting such a simple operation into four methods merely consisting of one or two lines does *not* add to readability. `generateConcordance(Files.lines(Paths.get("SomeRandomFile.txt")));` and `generateConcordance(new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8)).lines());` will do, maybe with some added local variables which are required for a proper `try(…) {}` statement anyway. – Holger Jun 02 '15 at 09:24