0

Hi I'm writing a program that parses a String into individual component as but when I try to test it out, I get an Out of Memory Error. I feel as if my for/while loops are infinite but I can't seem to find the reason why.

    //for loop to loop through char of string
    for(int i=0; i<expressionString.length(); i++) {

        //cast char into ascii int
        int ascii = (int) charAt(i);

        //appending to token if one of  singly operator symbols: *,/,(,),[,]
        if(ascii == 40 || ascii == 41 || ascii == 42 || ascii == 47 || ascii == 91 || ascii == 93){
            token.append((char) ascii);
            tokenList.add(token.toString());

        } //append if +, -
        else if(ascii == 43 || ascii == 45) {
            token.append((char) ascii);

            //check next char if + or /, if so append to token again
            int nextChar = (char) charAt(i+1);
            if(nextChar == 43 || nextChar == 45) {
                token.append((char) nextChar);
            }
            tokenList.add(token.toString());

        } //appending to token if it's a num
        else if ( ascii >= 48 || ascii <=57) {
            token.append((char) ascii);

            //check if next char is a num
            while ((int) charAt(i+1) >= 48 || (int) charAt(i+1) <= 57) {
                //increment i in for loop to check
                i++;
                token.append((int) charAt(i));
            }
            tokenList.add(token.toString());
        }
        //  
    }

Error

Please let me know if this is an error with my code as I can't seem to detech where the problem is. Thank you!

Sam
  • 495
  • 3
  • 11
  • 19
  • 1
    What is line 51? That seems to be where the exception occurs. Also: I think you need an `&&` here and not a `||`. `( ascii >= 48 || ascii <=57)` – MFisherKDX Oct 22 '17 at 17:10
  • Aside: you wouldn't need to comment about "`*,/,(,),[,]`" if you just used `ascii == '*'` etc in the condition (or `"*/()[]".indexOf((char) ascii) >= 0`). – Andy Turner Oct 22 '17 at 17:11
  • 1
    If @MFisherKDX 's hint doesn't help provide whole class please. – Russiancold Oct 22 '17 at 17:12
  • 3
    You are forever appending to the end of `token`, never removing from it. Do you intend to do that, or should you be deleting its contents after the `tokenList.add` calls? – Andy Turner Oct 22 '17 at 17:16
  • `tokenList` is an array list and I'm trying to append each `token` to the array list separately. I'm not sure I understand what you mean about deleting its content -- why would I need to delete it? If you could elaborate a bit more, @AndyTurner – Sam Oct 22 '17 at 17:19
  • @Sam the first time you call `tokenList.append`, you append, say `*`. Next time, you append `*+`. The time after, you append `*+0`. The time after, you append `*+0*`. You are appending what you appended last time, plus some extra stuff from this time. Do you intend to do this? – Andy Turner Oct 22 '17 at 17:21
  • In other words your memory consumption is huge because your substrings are getting longer and longer but don't share any data – cpp beginner Oct 22 '17 at 17:25
  • Oh if I'm not mistaken, I should add `token.delete(0, token.length());` after `tokenList.append`. Was that an issue with out of memory error as well or is that different? @AndyTurner – Sam Oct 22 '17 at 17:25
  • @MFisherKDX @Russiancold line 51 is from a tester class: `exp = new Expression(expString);` Is this the issue? – Sam Oct 22 '17 at 17:28
  • @Sam given that you are OOM'ing on the reallocation of a larger internal buffer in `StringBuilder`, it will undoubtedly help. But note that you don't need the `StringBuilder`, you could just append `expressionString.subString(i, someEndPoint)`, where `sEP` is the index of the character after the last one you append to `token` at present. – Andy Turner Oct 22 '17 at 17:28
  • @AndyTurner What I'm trying to do is take different parts of a mathematical expression for ex. `(++3)` and parse it into individual components to add to the array list so it's `"(", "++", "3", ")"`. Can I still append if I have repeats as in the case of the increment/decrement operators so that it still remains as ++/--? – Sam Oct 22 '17 at 17:35
  • Which one is line 62 in Expression.java? And what is `expressionString.length()` (i.e. how long is your input string?) – Thomas Kläger Oct 22 '17 at 17:41

2 Answers2

1

Here is simplified version of what you are doing in that loop.

public class Main {

    public static void main(String[] args) {
        String str = "ABCDE";

        StringBuilder sb = new StringBuilder();
        List<String> list = new ArrayList<>();
        for (char c : str.toCharArray()) {
            sb.append(c);                     
            list.add(sb.toString());  // <-- Problem! This adds the *entire* contents of the StringBuilder as a new String to the list.
        }

        System.out.println(list);
    }
}

This program prints

[A, AB, ABC, ABCD, ABCDE]

This is because each time we append a char to the StringBuilder, we then add the entire contents of the StringBuilder as a new String to the ArrayList.

Now suppose we replace "ABCDE" with a String of length 1000000, for example we change the first line to

String str = Stream.generate(() -> "A").limit(1000000).collect(Collectors.joining()); // String of length 1000000

We are now trying to create 1000000 String objects of lengths from 1 to 1000000, with predictable results.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.StringBuilder.toString(StringBuilder.java:407)
    at my_package.Main.main(Main.java:17)

How to fix it? It depends on what you are trying to do (and we don't have all the context), but I suspect you don't need both a StringBuilder and a List.

Paul Boddington
  • 37,127
  • 10
  • 65
  • 116
1

As I indicated in the comments, the fact that you are appending to a StringBuilder without ever removing anything from it is dubious.

StringBuilder is simply a wrapper around a char[], which gets automatically resized when necessary to accommodate the new text that you are trying to append. You can see in the stack trace that the OOM is occurring during one of these automatic resizes.

One solution to this problem is simply to allocate a large enough buffer initially, then the resizes don't need to occur until much more text has been appended to the StringBuilder:

StringBuilder token = new StringBuilder(MAXIMUM_EXPECTED_SIZE);

The problem with this is that it may be hard to determine MAXIMUM_EXPECTED_SIZE; and moreover that you may be wasting a lot of memory most of the time, where you are appending nowhere near that amount of text to the buffer.

It seems like you don't actually want to keep the text in token once you have transferred it to tokenList. You can delete it explicitly from the buffer using:

token.delete(0, token.length());
// or
token.setLength(0);

(Actually, this doesn't delete the data, it just allows subsequent appends to overwrite it)

But this is still wasteful: you don't need the StringBuilder at all.

Consider how you handle the numbers:

     if ( ascii >= 48 || ascii <=57) {
        token.append((char) ascii);

        //check if next char is a num
        while ((int) charAt(i+1) >= 48 && (int) charAt(i+1) <= 57) {
                                   //  ^^ NB
            //increment i in for loop to check
            i++;
            token.append((int) charAt(i));
        }
        tokenList.add(token.toString());
    }

What you're apparently trying to do here is to append everything between the i-th character (inclusive) and the j-th character (exclusive), where j points either to the end of the string, or to a non-numeric character. So you can do that like this:

     if ( ascii >= 48 || ascii <=57) {
        int j = i + 1;

        //check if next char is a num
        while (j < expressionString.length() && charAt(j) >= '0' && charAt(j) <= '9') {
            j++;
        }
        tokenList.add(expressionString.subString(i, j));
        i = j;
    }

You can do similarly for the other appending of tokens. This just cuts out the "middle man" of the StringBuilder, which obviously avoids problems with it re-allocating its internal buffer.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243