6

I have 81K records of Long object and I am trying to store it in HashSet. My code snippet looks like this:

private static HashSet<Long> hashSet = new HashSet<>(Arrays.asList(*81K records*));

While compiling this is giving me StackOverflow Error. I am not understanding why only 81K records are being problem here? Solutions are appreciated.

Java version. :

openjdk version "1.8.0_322"
OpenJDK Runtime Environment Corretto-8.322.06.1 (build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM Corretto-8.322.06.1 (build 25.322-b06, mixed mode)

Stack Trace:

[javac] 
    [javac] 
    [javac] The system is out of resources.
    [javac] Consult the following stack trace for details.
    [javac] java.lang.StackOverflowError
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)
    [javac]     at com.sun.tools.javac.code.Type.map(Type.java:220)

Line 220 of Type:

 208     /**
 209      * Return the least specific subtype of t that starts with symbol
 210      * sym.  If none exists, return null.  The least specific subtype
 211      * is determined as follows:
 212      *
 213      * <p>If there is exactly one parameterized instance of sym that is a
 214      * subtype of t, that parameterized instance is returned.<br>
 215      * Otherwise, if the plain type or raw type `sym' is a subtype of
 216      * type t, the type `sym' itself is returned.  Otherwise, null is
 217      * returned.
 218      */
 219     public Type asSub(Type t, Symbol sym) {
 220         return asSub.visit(t, sym);
 221     }
 222     // where
 223         private final SimpleVisitor<Type,Symbol> asSub = new SimpleVisitor<Type,Symbol>() {
ajparth
  • 71
  • 5
  • 2
    A _stack_ overflow, as opposed to a _memory_ overflow, implies some recursion has gotten out of hand. You would have gotten a stack trace. Show it to us. – Louis Wasserman Sep 01 '22 at 18:15
  • @mlecz no. Its during compilation. – ajparth Sep 01 '22 at 18:18
  • I believe this limit was raised in later java versions (used to be 65535), but passing 81,000 records as an _array_ (which varargs are) could very possibly run afoul of the stack's capabilities. You may in fact need the `Xmn` argument as well if you want to manually push it through – Rogue Sep 01 '22 at 18:19
  • @LouisWasserman stack trace updated. – ajparth Sep 01 '22 at 18:21
  • Are there any lines in that stack trace that aren't the `com.sun.tools....Type.map` line? Also, include the specific version/build of Java you are using. – Rogue Sep 01 '22 at 18:24
  • 2
    Just for clarification: you don’t have an array of 81k elements, but pass 81k actual arguments to `Arrays.asList`? This might be related to [this question](https://stackoverflow.com/q/31105642/2711488). There are practical limits to compiler implementations. But the initialization of an array with 81k elements wouldn’t fit into a single method anyway. – Holger Sep 01 '22 at 18:30
  • 1
    Anecdotally, data doesn't define behavior, and rarely belongs in the code itself. It sounds like this is data that you could _load_ from somewhere (a file, a database, etc), and operate on it in a streamable fashion (or even all at once, if that's what you truly want). – Rogue Sep 01 '22 at 18:37
  • yes @Holger , I also tried of using array of 81k elements. its not fitting in defined method size. – ajparth Sep 01 '22 at 18:40
  • The stack trace seems pretty clearly to suggest that the issue is that the _compiler_ is having difficulty _type checking_ such a long `ArrayList`. Try marking the type explicitly `ArrayList` and see if that helps. (Alternately, try loading the data from a file instead of hardcoding it -- that should almost certainly fix the issue, judging by the stack trace. – Louis Wasserman Sep 01 '22 at 18:45
  • Voted to reopen as well. This bug makes lots of sense, it's reproducible, it's a good question. – Louis Wasserman Sep 01 '22 at 18:47
  • 1
    @LouisWasserman this doesn’t help with such a code, as it can’t get compiled anyway. When using a more recent JDK instead of Java 8, the compiler simply produces the error message “code too large” (and similar does Eclipse; it says “The code for the static initializer is exceeding the 65535 bytes limit”). – Holger Sep 01 '22 at 18:47
  • Well, sure, that's the other issue. You just straight up cannot bake this much data into a source file. That's why I suggested the alternate approach if the first one failed. – Louis Wasserman Sep 01 '22 at 18:47
  • Yes @Rogue , I could have used S3 bucket or DB, but I thought 81k records can be manageable in variable and it also can give enhanced performance compared to DB access. – ajparth Sep 01 '22 at 18:49
  • It can certainly be manageable in a variable, you just can't bake it into the `.java` source code. – Louis Wasserman Sep 01 '22 at 19:01

2 Answers2

8

The HashSet is irrelevant here. The problematic part is the varargs invocation of Arrays.asList with 81,000 elements.

To reproduce the issue, we can use the following code

class Tmp {
  static final String ARGUMENTS = "<<INSERT ARGUMENTS HERE>>";

  static final List<String> TEMPLATE = Arrays.asList(
      "import java.util.Arrays;",
      "import java.util.List;",
      "",
      "class Tmp {",
      "  static final List<Integer> L = Arrays.asList(",
           ARGUMENTS,
      "  );",
      "}");

  public static void main(String[] args) throws IOException {
    Path p = Files.createTempFile("Test", ".java");
    Files.write(p, () -> TEMPLATE.stream()
        .flatMap(line -> line.equals(ARGUMENTS)? varargsArgument(): Stream.of(line))
        .iterator());
    JavaCompiler c = ToolProvider.getSystemJavaCompiler();
    c.run(System.in, System.out, System.err, p.toString());
  }

  static Stream<CharSequence> varargsArgument() {
    return IntStream.range(0, 8100).mapToObj(i -> IntStream.range(0, 10)
            .mapToObj(j -> i * 10 + j + (i < 8099 || j < 9? ", ": ""))
            .collect(Collectors.joining()));
  }
}

With OpenJDK 8, it produces the

java.lang.StackOverflowError
    at com.sun.tools.javac.code.Type.map(Type.java:220)
   …

On recent JDKs, e.g. JDK 12, it produces

/tmp/Test14992292170362927520.java:6: error: code too large
  static final List<Integer> L = Arrays.asList(
                             ^

showing that even when the compiler bug has been fixed, such code can’t get compiled.

Such amount of data should be included as embedded resource which you read in once at startup.

Holger
  • 285,553
  • 42
  • 434
  • 765
6

The specific issue is that Java type inference cannot deal with such a long constant -- as reflected by the stack overflow you got in the Java compiler itself -- but it's also the case that Java bytecode format does not allow you to put such large amounts of data into your source code. The maximum size of the code for a method -- which is how Java initializes this behind the scenes -- is 64KB; storing 81K long constants alone is 10 times more than this limit.

You can certainly store this data into a HashSet and the like, but you must load it at runtime from a file.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413