Java java.util.regex.Matcher
replaceFirst(...)
/replaceAll(...)
API returns strings, which (if using the default heap size) may well cause an OOME for inputs as large as 20-50M characters. These 2 methods can be easily rewritten to write to Writer
s rather than construct stings, effectively eliminating one point of failure.
The Matcher
's factory method, however, only accepts CharSequence
s, which is also likely to throw an OOME if I use String
s/StringBuffer
s/StringBuilder
s.
How do I wrap a java.io.Reader
to implement a CharSequence
interface (given the fact that my regexps may contain backreferences)?
Is there any other solution which can replace regexps in files and is not OOME-prone on large inputs?
In other words, how do I implement a functionality similar to that of GNU sed
in Java (as sed
is known to tackle files as large as a couple terabytes, while featuring the same support for extended regular expressions)?