3

I have an input file that looks like this

@id1   1.2   3.4
@id2   6.8   8.1
@id3   1.5   9.4
@id4   5.9   2.7

I would like to store the numbers only in a 2D array, and forget about the 1st column that contains the @id. I also want to use streams only for that operation.

So far I made 2 methods :

First method read the input file and store each lines in a List, as an array of string :

private List<String[]> savefromfile(String filePath) throws IOException {

        List<String[]> rowsOfFile = new LinkedList<String[]>();
        try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
            lines.forEach(line -> {
                String rows[] = line.trim().split("\\s+");
                rowsOfFile.add(rows);
            });
            lines.close();
        }
        return rowsOfFile;

The second method take as an input the List, and return a 2D Array that contains only the columns numbers :

private double[][] storeAllID(List<String[]> rowsOfFile) {
        int numberOfID = rowsOfFile.size();
        double[][] allID = new double[numberOfID][2];
        int i = 0;
        for (String[] line : rowsOfFile) {
            double id[] = new double[2];
            id[0] = Double.parseDouble(line[1]); 
            id[1] = Double.parseDouble(line[2]);
            allID[i++] = id;
        }
        return allID;
    }

Is there a way to make this code more efficient ? I want only one, short method that read the input file and return a 2D array containing numbers only. I don't think it's necessary to write 20 lines of code to do this.

  • 4
    You have to be careful what you ask for. "less code" isnt necessarily more efficient. I hope you understand that alone the progress of creating a stream is more expensive than "old school" iterating with loops. So, if you worry about performance: make sure to properly measure your code (talking about performance improvements without measurement is pointless). – GhostCat Jun 24 '19 at 09:15
  • I think you could use a map operation to convert each line into a 1D array containing the two numbers in the line. Then use toArray to get an array containing all the arrays ? – Patrick Jun 24 '19 at 09:16
  • I should have added that the input file contains typically millions of ID. So for really big input files, streams seems to be more efficient. Also, you're right, I should have specified that I want a more readable code. – Wassim Alsafwi Jun 24 '19 at 09:25
  • You come into an allocation problem, if you want to automate this with Java 8 Streams, since the size of an array can not grow. Therefore you would have to work with Collections, if you want to use Streams and then repack the results into an arrays, as you already did.If you mean "memory gently" with efficient, maybe you can use Collections instead of array in your further code. If you mean "less code", there is nothing wrong with "helper functions" like storeAllID. Maybe think about creating a library, where you outsource functionality like that. For me that's one part of my daily business. – Guardian667 Jun 24 '19 at 09:28
  • thank you for your insightful feedback. Indeed, I wanted to make my code more readable and compact. As for performance issues, do you suggest using BufferReader instead ? How can me measure the performance between these 2 methods ? – Wassim Alsafwi Jun 24 '19 at 10:14
  • 1
    For measuring, start here: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java ... of course, experiment that involves reading from files and processing, that is a really difficult undertaking. It takes a lot of insight into your exact use case and overall setup to define a reasonable test procedure. – GhostCat Jun 24 '19 at 14:07

1 Answers1

3

You aren't really gaining any benefit on your use of a stream in savefromfile, since you are using it exactly like it was a plain for-loop. To make the code a bit cleaner, you could get rid of the local variable completely, and also the call to close() is unnecessary as you are using try-with-resources already.

private List<String[]> savefromfile(String filePath) throws IOException {
    try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
        return lines
            .map(line -> line.trim().split("\\s+"))
            .collect(Collectors.toCollection(LinkedList::new));
    }
}

I don't know why you want to separate the parsing to double[][] into a separate method, as you could do it within your stream with a map:

    private double[][] loadFromFile(String filePath) throws IOException {
        try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
            return lines
                .map(line -> line.trim().split("\\s+"))
                .map(line -> new double[] {
                    Double.parseDouble(line[1]),
                    Double.parseDouble(line[2])
                })
                .toArray(double[][]::new);
        }
    }

For performance, you'll just have to measure for yourself if using lower-level data types and loops would be worth the added complexity.

vlumi
  • 1,321
  • 1
  • 9
  • 18
  • thank you for this feedback. Your code is very clean, readable, thats what I was looking for. As for performance issues, do you suggest using BufferReader as it would be more efficient ? Thank you – Wassim Alsafwi Jun 24 '19 at 10:10
  • The implementation of `Files.lines()` in JDK (at least 8) uses `BufferedReader`: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/be30cb2a3088/src/share/classes/java/nio/file/Files.java#l3743 – vlumi Jun 24 '19 at 10:21
  • @WassimAlsafwi I'm glad you solved your problem. Feel free to [accept this answer](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) if you feel it was useful to you. – Samuel Philipp Jun 24 '19 at 14:28