I am doing an analysis on a rather large scale (1000's of projects) for which I am extracting test framework usage from source code (e.g. detecting assertEquals to measure assert density). For this, I do not want to take into account any statements that have been commented out. In order to do this, I have the following method:
public static CharSequence replaceAllRegexInFile(CharSequence input, String regex) {
if (regex == null || input == null) {
return input;
}
Pattern pattern = Pattern.compile(regex);
return pattern.matcher(input).replaceAll("");
}
I am running this method with the following regex to replace Java comments :
(\/\*([\S\s]+?)\*\/|(?s)/\*.*?\*/)".
I am well aware that replaceAll is allocating a lot of intermediate results while aggregating and returning the final result. Surely, I could resort to using replace, but this will not allow me to use a regex for replacing the comments.
I get why the heapspace error is thrown, especially since I am streaming all files and all projects concurrently over my entire machine. Surely this is using a lot of resources, but I am unable to find an alternative solution for my problem since the regex replacement is definitelly a requirement.
Any suggestions would be greatly appreciated.
You can find the stacktrace below:
Exception in thread "main" java.lang.OutOfMemoryError
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at AnalysisRunner.startAnalysis(AnalysisRunner.java:33)
at AnalysisRunner.main(AnalysisRunner.java:26)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:541)
at java.lang.StringBuffer.append(StringBuffer.java:350)
at java.util.regex.Matcher.appendReplacement(Matcher.java:888)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at Business.RegexService.replaceAllRegexInFile(RegexService.java:64)
at Business.FrameWorkDetectionService.extractAllResultsForFile(FrameWorkDetectionService.java:58)
at Business.FrameWorkDetectionService.lambda$extractFrameworkDependencies$0(FrameWorkDetectionService.java:39)
at Business.FrameWorkDetectionService$$Lambda$19/1175339539.apply(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
at java.util.stream.AbstractTask.compute(AbstractTask.java:316)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool.helpComplete(ForkJoinPool.java:1870)
at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2045)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:404)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at Business.FrameWorkDetectionService.extractFrameworkDependencies(FrameWorkDetectionService.java:39)
at Business.FrameWorkDetectionService.detectFrameworks(FrameWorkDetectionService.java:26)
at Business.FrameworkService.projectResults(FrameworkService.java:59)
at AnalysisRunner$$Lambda$13/1712669532.apply(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
Is there an alternative solution that will not allocate this much heap space that will still allow me to replace all comments in a lot of files concurrently?
Any help is greatly appreciated!