1

I have two different methods which actually do the same but are implemented a bit different. They walk through a directory and read all files in it and check how many files with a certain name are in the directory. Now I want to know which is faster but both are similar and take around 3-4 seconds (the directory has millions of files) but how can I know which is really faster? Is there a method which compares the speed of them?

  1. method)

    private void getAllRelatedFilesEig(String corrId) throws InterruptedException, IOException
        {
    log.debug("Get all files with corrId=" + corrId + " from directory=" + processingDir);                  
    
    Profiler profiler = Profiler.createStarted();
    
        Files.list(Paths.get(processingDir))
        .filter(p -> 
        p.getFileName().toString()
        .indexOf("EPX_" + corrId + "_") >= 0)
        .forEach( path ->
        {
                    try
                    {
                        EPEXFile file = new EPEXFile(path);
    
                        if (file.isTranMessage())
                        {
                            if (file.isOrderMessage())
                            {                       
                                orderFiles.add(file);
                            }                       
                            else
                            {                       
                                tradeFiles.add(file);
                            }
                        }
                        else
                        {
                            infoFiles.add(file);
                        }
                    }
                    catch (IFException ex)
                    {
                        log.error("Error creating EPEXFile object " + ex.getMessage());
                    }
        }
                );
    
    profiler.stop("allFilesWithSameCorrIdRetrieval");
    
    log.info(orderFiles.size() + " order files with corrId=" + corrId);
    log.info(tradeFiles.size() + " trade files with corrId=" + corrId);
    log.info(infoFiles.size() + " info files with corrId=" + corrId);
    
    profiler = Profiler.createStarted();       
    
    profiler.stop("processFiles"); 
    
    orderFiles.clear();
    tradeFiles.clear();
    infoFiles.clear();
        }
    
  2. method)

    private void getAllRelatedFilesOrig(String corrId) throws InterruptedException, IOException {
    log.debug("Get all files with corrId=" + corrId + " from directory=" + processingDir);
    
    Path dirPath = Paths.get(processingDir);
    
    ArrayList<Path> fileList;
    
    Profiler profiler = Profiler.createStarted();
    
    try (Stream<Path> paths = Files.walk(dirPath)) {
        fileList = paths.filter(t -> (t.getFileName().toString().indexOf("EPX_" + corrId + "_") >= 0))
                .collect(Collectors.toCollection(ArrayList::new));
    
        for (Path path : fileList) {
            try {
                EPEXFile file = new EPEXFile(path);
    
                if (file.isTranMessage()) {
                    if (file.isOrderMessage()) {
                        orderFiles.add(file);
                    } else {
                        tradeFiles.add(file);
                    }
                } else {
                    infoFiles.add(file);
                }
            } catch (IFException ex) {
                log.error("Error creating EPEXFile object " + ex.getMessage());
            }
        }
    }
    profiler.stop("allFilesWithSameCorrIdRetrieval");
    
    log.info(orderFiles.size() + " order files with corrId=" + corrId);
    log.info(tradeFiles.size() + " trade files with corrId=" + corrId);
    log.info(infoFiles.size() + " info files with corrId=" + corrId);
    
    profiler = Profiler.createStarted();
    
    profiler.stop("processFiles");
    
    orderFiles.clear();
    tradeFiles.clear();
    infoFiles.clear();
    }
    

I have tried to figure it out with the Profiler class but I could not figure out which is exactly faster since sometimes the first and sometimes the second is faster. Is there even a way to say which is faster in general? Even when it is just a little bit faster it would help me to know which one it is.

Mad Scientist
  • 857
  • 4
  • 16
  • 43
  • 1
    you can try: http://openjdk.java.net/projects/code-tools/jmh/ but mind that result may be affected by file system efficiency – bonzo Nov 05 '18 at 12:50
  • 1
    Code review should go to https://codereview.stackexchange.com/ – azro Nov 05 '18 at 12:52
  • I have have not *investigated* the methods, but seems like they are doing pretty much the same, so I would not wonder if the times aren't that different... more since the bottleneck is probably the file system and less the java code. – user85421 Nov 05 '18 at 12:56
  • 1
    A minor note on the second implementation: Java aint C. You don't declare your variables upfront in the beginning of the method. You declare them at the point where you need them, and with the smallest scope possible. – GhostCat Nov 05 '18 at 13:04

1 Answers1

1

I recently wrote this method to test two of my methods which did the exact same thing differently.

private void benchMark(){

    long t, t1=0, t2 =0;

    for (int i =0; i< 50; i++){
        t= System.currentTimeMillis();
        method1();
        t1 += System.currentTimeMillis()-t;


        t= System.currentTimeMillis();
        method2();
        t2+= System.currentTimeMillis()-t;
    }


    System.out.println("Benchmarking\n\tMethod 1 took + "+t1+" ms\n\tMethod 2 took "+t2+" ms");
}

That's a brute way to do it, but it works since I found that one of my methods was consistently about 5% faster in every of my tests.

I call the methods one after the other to diminish the effect of performance variations during the test.

Whole Brain
  • 2,097
  • 2
  • 8
  • 18