0

I've made this so far

import java.io.File;
import java.io.FileInputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.io.IOUtils;

public class Test {

    public static void main(String... args) {
        Pattern p = Pattern.compile("(?s).*(MyFunc[(](?s).*[)];)+(?s).*");

        File[] files = new File("C:\\TestDir").listFiles();
        showFiles(files, p);
    }

    public static void showFiles(File[] files, Pattern p) {
        for (File file : files) {
            if (file.isDirectory()) {
                System.out.println("Directory: " + file.getName());
                showFiles(file.listFiles(), p); // Calls same method again.
            } else {
                System.out.println("File: " + file.getAbsolutePath());

                String f;
                try {
                    f= IOUtils.toString(new FileInputStream(file.getAbsolutePath()), "UTF-8");
                    System.out.println(file.getName());
                    Matcher m = p.matcher(f);

                    if (m.find()) {
                        System.out.println(m.group());
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                    return;
                }
            }
        }
    }
} 

What I want to do is find every call of MyFunc written in files inside a certain directory (that may have subdirectories with files that should be checked too). The number of files is pretty big, but the above is very very slow for even single file of 1Mb. Do you have any idea of how to achieve what I want? I didn't expect this to be so slow.

EDIT// If this can't be done efficiently by a simple program, please feel free to advice me on useful FREE frameworks. Thank you for your help everyone.

Alkis Kalogeris
  • 17,044
  • 15
  • 59
  • 113
  • Are you limited to java? This sounds like a re-implementation of the grep command in linux. If you can I would recommend using that. – johnsoe Nov 18 '13 at 21:44
  • No it's not limited. I've thought about grep. Am I going to have significant increase in performance? I'm limited to windows environment so is there a windows version? – Alkis Kalogeris Nov 18 '13 at 21:47
  • It's a java project so eclipse does that too. But is there a way to export the results in a file? (The question stands, as I'm interested in an efficient way of doing this, programmatically, but no harm in finding some tricks in eclipse too) – Alkis Kalogeris Nov 18 '13 at 21:57

3 Answers3

1

You are likely taking a hit on creating a String out of each file's contents. This will stress the heap and garbage collector.

You can use the Scanner object to help with this:

http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html

Additionally this has been answered here already:

Performing regex on a stream

Best of luck!

Community
  • 1
  • 1
mohr_michael_a
  • 204
  • 1
  • 3
1

This may help you along a little further:

http://www.java-tips.org/java-se-tips/java.util.regex/how-to-apply-regular-expressions-on-the-contents-of-a.html

Again, creating a String for each file is costly. This example uses memory mapped files to avoid the hit on the garbage collector. This will instead use the C based heap instead of memory inside the JVM.

mohr_michael_a
  • 204
  • 1
  • 3
  • You should just edit your first answer. Please do that so I can select your answer and close this thread. And delete the second message. Thank you for your help. – Alkis Kalogeris Nov 19 '13 at 05:56
1

The problem with your approach is the regular expression you're using. You're including .* at the beginning and at the end of your pattern, that will increase processing dramatically. Try the same code with the following regex:

(MyFunc\\(.*?\\);)

You can also apply the enhancements proposed by the other answers but I am pretty sure your bottleneck is in the regex itself.

Good luck!

user3001267
  • 304
  • 1
  • 4
  • 1
    Also, notice that your code will only print the first occurrence of MyFunc. If you want to print all, change if (m.find()) for while (m.find()) – user3001267 Nov 19 '13 at 06:57