1

I have a relatively inefficent CSVReader code, see below. It takes more than 30 seconds to read 30000+ lines. How to speed up this reading process as fast as possible?

public class DataReader {

    private String csvFile;
    private List<String> sub = new ArrayList<String>();
    private List<List> master = new ArrayList<List>();


    public void ReadFromCSV(String csvFile) {

        String line = "";
        String cvsSplitBy = ",";

        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
            System.out.println("Header " + br.readLine());
            while ((line = br.readLine()) != null) {

                // use comma as separator
                String[] list = line.split(cvsSplitBy);
//                System.out.println("the size is " + country[1]);
                for (int i = 0; i < list.length; i++) {
                    sub.add(list[i]);
                }
                List<String> temp = (List<String>) ((ArrayList<String>) sub).clone();
//                master.add(new ArrayList<String>(sub));
                master.add(temp);
                sub.removeAll(sub);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println(master);
    }

    public List<List> getMaster() {
        return master;
    }

}

UPDATE: I have found that my code actually can finish the reading work in less than 1 second if run it separately. As this DataReader is a part used by my simulation model to initialize the relevant properties. And the following part is associated with the use of the data imported, WHICH TAKES 40 SECONDS TO FINISH! Anyone could help by looking at the generic part of the codes?

//      add route network
        Network<Object> net = (Network<Object>)context.getProjection("IntraCity Network");
        IndexedIterable<Object> local_hubs = context.getObjects(LocalHub.class);
        for (int i = 0; i <= CSV_reader_route.getMaster().size() - 1; i++) {
            String source = (String) CSV_reader_route.getMaster().get(i).get(0);
            String target = (String) CSV_reader_route.getMaster().get(i).get(3);
            double dist = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(6));
            double time = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(7));

            Object source_hub = null;
            Object target_hub = null;
            Query<Object> source_query = new PropertyEquals<Object>(context, "hub_code", source);
            for (Object o : source_query.query()) {
                if (o instanceof LocalHub) {
                    source_hub = (LocalHub) o;
                }
                if (o instanceof GatewayHub) {
                    source_hub = (GatewayHub) o;
                }
            }

            Query<Object> target_query = new PropertyEquals<Object>(context, "hub_code", target);
            for (Object o : target_query.query()) {
                if (o instanceof LocalHub) {
                    target_hub = (LocalHub) o;
                }
                if (o instanceof GatewayHub) {
                    target_hub = (GatewayHub) o;
                }
            }

//          System.out.println(target_hub.getClass() + " " + time);
//          Route this_route = (Route) net.addEdge(source_hub, target_hub);
//          context.add(this_route);
//          System.out.println(net.getEdge(source_hub, target_hub));
            if (net.getEdge(source, target) == null) {
                Route this_route = (Route) net.addEdge(source, target);
                context.add(this_route);
//              this_route.setDist(dist);
//              this_route.setTime(time); }
            }



        } 
Jack
  • 1,339
  • 1
  • 12
  • 31

3 Answers3

2

I don't have a CSV that big, but you could try the following:

public static void main(String[] args) throws IOException {
    Path csvPath = Paths.get("path/to/file.csv");
    List<List<String>> master = Files.lines(csvPath)
            .skip(1)
            .map(line -> Arrays.asList(line.split(",")))
            .collect(Collectors.toList());
}

EDIT: I tried it with a CSV sample with 50k entries and the code runs in less than one second.

Alex R
  • 3,139
  • 1
  • 18
  • 28
  • it's not working. report Exception in thread "main" java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1 at java.base/java.nio.file.FileChannelLinesSpliterator.readLine(FileChannelLinesSpliterator.java:173) – Jack Oct 25 '19 at 04:56
  • @Jack It's probably some problem with the encoding of your file. You can pass a `Charset` as the second parameter of the `Files.lines` method. Take a look at [this](https://stackoverflow.com/questions/26268132/all-inclusive-charset-to-avoid-java-nio-charset-malformedinputexception-input) or [this](https://stackoverflow.com/questions/13625024/how-to-read-a-text-file-with-mixed-encodings-in-scala-or-java) – Alex R Oct 25 '19 at 05:10
2

In your code you are doing many write operation to just add the list of values from current row in your master list which is not required. You can replace the existing code with simple one as given below.

Existing code:

String[] list = line.split(cvsSplitBy);
//                System.out.println("the size is " + country[1]);
for (int i = 0; i &lt; list.length; i++) {
    sub.add(list[i]);
}

List<String> temp = (List<String>) ((ArrayList<String>) sub).clone();
//                master.add(new ArrayList<String>(sub));
master.add(temp);
sub.removeAll(sub);

Suggested code:

master.add(Arrays.asList(line.split(cvsSplitBy)));
user207421
  • 305,947
  • 44
  • 307
  • 483
Ashok Prajapati
  • 374
  • 2
  • 7
1

With extends to the answer of @Alex R, you can process it in parallel as well like this:

public static void main(String[] args) throws IOException {
        Path csvPath = Paths.get("path/to/file.csv");
        List<List<String>> master = Files.lines(csvPath)
                .skip(1).parallel()
                .map(line -> Arrays.asList(line.split(",")))
                .collect(Collectors.toList());
    }
Atul
  • 3,043
  • 27
  • 39
  • Yes but in case if you dont want to keep the track of lines and just process the data then you can process them in parallel as well to get the result fast. – Atul Oct 25 '19 at 05:54
  • 2
    Reading in parallel may even slow the whole thing down... There are multiple posts about that, like [this](https://bytefish.de/blog/jdk8_files_lines_parallel_stream/) or [this](https://stackoverflow.com/questions/25711616/how-to-read-all-lines-of-a-file-in-parallel-in-java-8) – Alex R Oct 25 '19 at 05:58
  • Thanks @Alex R for information. If you are using Java9 or later version then it is working as expected. But for Java8 avoid parallel processing then. – Atul Oct 25 '19 at 06:10