0

I am having a List of DTO which is mapped from a HTTP response(using RestTemplate call) having two value id and content. When I am iterating over list of DTO's, I am escaping HTML characters in content and replacing some unwated characters using the code below:

    String content = null;
    for(TestDto testDto: testDtoList) {
        try {
            content = StringEscapeUtils.unescapeHtml4(testDto.getContent()).
                                       replaceAll("<style(.+?)</style>", "").
                                       replaceAll("<script(.+?)</script>", "").
                                       replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " ").
                                       replaceAll("[^a-zA-Z0-9\\\\.]+", " ").
                                       replace("\\n", " ").
                                       replaceAll("\\\\r","").trim();
            processContent(content);
        } catch (Exception e) {
          System.out.println("Content err: " + e.getMessage());
        }
    }

In between the loop, code get halted due to java constant string too long exception. Even I am not able to catch this exception. How should I handle this problem?

EDIT :

The length of getContent() String can exceeds Integer.MAX_VALUE

AxelH
  • 14,325
  • 2
  • 25
  • 55
RGG
  • 33
  • 1
  • 12
  • Maybe try using `org.apache.commons.lang.StringUtils.join`? – Idos Jan 17 '17 at 10:04
  • 2
    Possible duplicate of [Java "constant string too long" compile error. Only happens using Ant, not when using Eclipse](http://stackoverflow.com/questions/2738574/java-constant-string-too-long-compile-error-only-happens-using-ant-not-when). Don't think this is related to Ant – AxelH Jan 17 '17 at 10:04
  • @AxelH No it is not related to Ant – RGG Jan 17 '17 at 10:22
  • That's what I just said ... the duplicate question talked about And but this is not the reason. Read this thread. PS : always post the exception/error/throwable message **and the stack trace** – AxelH Jan 17 '17 at 10:27
  • I believe you stripped the example to much. The assignment `content =` inside the loop is never used. After the loop `content` has it's last assignemtn. So what is the sense to do it inside a loop? – SubOptimal Jan 17 '17 at 10:29
  • @Idos How can I use that?..String is not an array – RGG Jan 17 '17 at 10:33
  • @SubOptimal, the loop is there to execute this logic on every DTO, so the content is different on each iteration. What he will do with it is not really relevent here (or is it?) – AxelH Jan 17 '17 at 10:35
  • @SubOptimal Yes I stripped the example..it is used in call of another function just after assignment. – RGG Jan 17 '17 at 10:35
  • Could you run the (above code) with the content value which lead to the exception and mark the line which throws the exception. – SubOptimal Jan 17 '17 at 11:53

3 Answers3

0

That code is hard to read anyways so you might want to refactor it. One thing you could try is to use a StringBuffer along with Pattern, Matcher and the appendReplacement() and appendTail() methods. That way you could prove a list of patterns and replacements, iterate over it, iterate over all occurences of the current pattern and replace it. Unfortunately those methods don't accept StringBuilder but it might at least be worth a try. In fact, the replaceAll() method basically does the same but by doing it yourself you could skip the return sb.toString(); part which probably causes the problem.

Example:

class ReplacementInfo {
  String pattern;
  String replacement;
}

List<ReplacementInfo> list = ...; //build it

StringBuffer input = new StringBuffer( testDto.getContent() );
StringBuffer output = new StringBuffer( );

for( ReplacementInfo replacementInfo : list ) {
  //create the pattern and matcher for the current input
  Pattern pattern = Pattern.compile( replacementInfo.pattern );
  Matcher matcher = pattern.matcher( input );

  //replace all occurences of the pattern
  while( matcher.find() ) {
    matcher.appendReplacement( output, replacementInfo.replacement );
  }
  //add the rest of the input
  matcher.appendTail( output );

  //reuse the output as the input for the next iteration
  input = output;
  output = new StringBuffer();
}

At the end input would contain the result unless you handle reusing the intermediate steps differently, e.g. by clearing the buffers and adding output to input thus keeping output until the next iteration.

Btw, you might also want to look into using StringEscapeUtils.UNESCAPE_HTML4.translate(input, writer) along with a StringWriter that allows you to access the underlying StringBuffer and thus completely operate on the content without using String.

Thomas
  • 87,414
  • 12
  • 119
  • 157
  • Nice generic logic. But correct me if I am wrong, but the resulting String will be the same. So this won't fit in a String either. Right ? – AxelH Jan 17 '17 at 10:32
  • @AxelH that might be the case but it depends on what `testDto.getContent()` is and what you need. the code above would allow you not to use strings at all but work on `CharacterSequence` instances only. It also depends on where exactly you get that exception because the actual problem might be something else. – Thomas Jan 17 '17 at 10:35
  • But if I am not mistaken, the exception should occurs because the length is greater than `Integer.MAX_VALUE`, the limit of String and also the limit of Arrays. Since `Charsequence` is using `charAt(int)` it might be limit there too. (only guess ;) ) – AxelH Jan 17 '17 at 10:41
  • @AxelH that might be the case but strings with that size might be problematic anyways. What's the largest content you expect? – Thomas Jan 17 '17 at 10:43
  • @AxelH Above reason is right that String length exceeds Integer.MAX_VALUE – RGG Jan 17 '17 at 10:45
  • **I** don't expect anything ;) But this exception let me believe this is already the case so I would believe this should be done using a file and reading line per line (see the duplicate). – AxelH Jan 17 '17 at 10:46
  • 1
    @AxelH ah yes, mixed up the "players" here ;) – Thomas Jan 17 '17 at 10:47
  • @RGG so the input can be larger than 2,147,483,647 characters which would already take 4GB of memory? In that case you'll _really_ want to work with files and only check smaller chunks. But what data do you work with that a sincle DTO can get that huge? – Thomas Jan 17 '17 at 10:50
  • @Thomas I know it is very rare that string is too long. But in my usecase there are many cases. – RGG Jan 17 '17 at 10:50
  • No problem ;) For information, here are [the limitation of String/Array](http://stackoverflow.com/a/1179996/4391450), depending on the HEAP size, this could be smaller. But I would still use a File to store this text and working line per line. (Sorry for the spam, I stop here) – AxelH Jan 17 '17 at 10:50
0

Change your catch block like below,

    String content = null;
    for(TestDto testDto: testDtoList) {
        try {
            content = StringEscapeUtils.unescapeHtml4(testDto.getContent()).
                                   replaceAll("<style(.+?)</style>", "").
                                   replaceAll("<script(.+?)</script>", "").
                                   replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " ").
                                   replaceAll("[^a-zA-Z0-9\\\\.]+", " ").
                                   replace("\\n", " ").
                                   replaceAll("\\\\r","").trim();
        } catch (ContentTooLongException e) {
            System.out.println("Content err: " + e.getMessage());
        }catch (Exception e) {
            System.out.println("other err: " + e.getMessage());
        }
    }

Now you'll be able to handle any exception.

Anil Agrawal
  • 2,748
  • 1
  • 24
  • 31
  • Catching an exception should never be THE solution. Especially if there is nothing done to handle it. – AxelH Jan 17 '17 at 10:58
  • Actually its difficult to identify the issue without knowing the actual response. May be its related to some character encoding issue or something else. At least after handling the issue we can take care of response as per requirement instead of throwing exception to server. – Anil Agrawal Jan 17 '17 at 11:02
  • See [Is it a bad practice to catch Throwable?](http://stackoverflow.com/q/6083248/4391450). – AxelH Jan 17 '17 at 11:06
  • Instead of catching Throwable we can catch desired exceptions like : catch(ContentTooLongException e) {}catch(Exception e){} – Anil Agrawal Jan 17 '17 at 11:12
  • Last comment (too chatty here ;) ) : You are still simply catching an exception where OP want to prevent it. So you are just saying "Ignore those DTOs, that not that important". – AxelH Jan 17 '17 at 11:18
  • Its not the actual solution, but its better to return meaningful response (even if its a error message) to client rather than a server exception. For complete solution we need to analyse the actual DTOs – Anil Agrawal Jan 17 '17 at 11:24
0

Supposing your DTO isn't big enough, you could:

  1. store the response in a temporary file,
  2. add a catch clause with the specific exception that is thrown during the runtime, and inside the clause the handling code for it.

That way you can parse the strings and when the exception hits, you could handle the long string by splitting it in parts and cleaning it.