0

We are using a REST API call that returns a string of a very large size. We are performing

httpget.getResponseBodyAsString();

to get the string returned by the REST call. Then we apply regex over this string to extract the substring that we require. When the string returned by the REST call is huge, we are facing Out Of Memory issues with the JVM.

We can also get the data from the REST call as a stream using

httpget.getResponseBodyAsString();

But is it possible to apply regex over the stream and extract the string that we require?

SpikETidE
  • 6,711
  • 15
  • 46
  • 62
  • how big is the string? I don't expect `OutOfMemory` occurs even for 1mb string data – RaceBase Apr 01 '14 at 10:31
  • @ Reddy : Yet, it does for us. – SpikETidE Apr 01 '14 at 10:44
  • Have a look at [this](https://github.com/fge/largetext), it may help you; requires that you write the input to a file first (the bytes, of course, not the characters) and then use your regex on the file. Still, if you have OOMs, it means you have a lot of concurrent requests. – fge Apr 01 '14 at 10:46
  • Just out of curiosity, who's writing the regex, and do they know what they're doing? You should make sure the regex is as efficient as possible, because you've probably got very little wiggle room. – Alan Moore Apr 01 '14 at 11:36

1 Answers1

2

These previous answers show a few options:

  1. Performing regex on a stream
  2. Applying a regular expression to a Java I/O Stream

I think that Scanner.findWithinHorizon mentioned by the first answer (above) may be an interesting option.

Community
  • 1
  • 1
Drew MacInnis
  • 8,267
  • 1
  • 22
  • 18
  • Just what I was thinking! I'm not in a position to test this, but you should be able to create the Scanner directly from the stream returned that's by `getResponseBodyAsStream()`. You can specify the encoding, too. – Alan Moore Apr 01 '14 at 11:25