14

I just found out about a pretty weird behaviour of Scala scoping when bytecode generated from Scala code is used from Java code. Consider the following snippet using Spark (Spark 1.4, Hadoop 2.6):

import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.broadcast.Broadcast;

public class Test {
    public static void main(String[] args) {
        JavaSparkContext sc = 
            new JavaSparkContext(new SparkConf()
                                .setMaster("local[*]")
                                .setAppName("test"));

        Broadcast<List<Integer>> broadcast = sc.broadcast(Arrays.asList(1, 2, 3));

        broadcast.destroy(true);

        // fails with java.io.IOException: org.apache.spark.SparkException: 
        // Attempted to use Broadcast(0) after it was destroyed
        sc.parallelize(Arrays.asList("task1", "task2"), 2)
          .foreach(x -> System.out.println(broadcast.getValue()));
    }
}

This code fails, which is expected as I voluntarily destroy a Broadcast before using it, but the thing is that in my mental model it should not even compile, let alone running fine.

Indeed, Broadcast.destroy(Boolean) is declared as private[spark] so it should not be visible from my code. I'll try looking at the bytecode of Broadcast but it's not my specialty, that's why I prefer posting this question. Also, sorry I was too lazy to create an example that does not depend on Spark, but at least you get the idea. Note that I can use various package-private methods of Spark, it's not just about Broadcast.

Any idea of what's going on ?

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
Dici
  • 25,226
  • 7
  • 41
  • 82

1 Answers1

23

If we reconstruct this issue with a simpler example:

package yuvie

class X {
  private[yuvie] def destory(d: Boolean) = true
}

And decompile this in Java:

[yuvali@localhost yuvie]$ javap -p X.class 
Compiled from "X.scala"
public class yuvie.X {
  public boolean destory(boolean);
  public yuvie.X();
}

We see that private[package] in Scala becomes public in Java. Why? This comes from the fact that Java private package isn't equivalent to Scala private package. There is a nice explanation in this post:

The important distinction is that 'private [mypackage]' in Scala is not Java package-private, however much it looks like it. Scala packages are truly hierarchical, and 'private [mypackage]' grants access to classes and objects up to "mypackage" (including all the hierarchical packages that may be between). (I don't have the Scala spec reference for this and my understating here may be hazy, I'm using [4] as a reference.) Java's packages are not hierarchical, and package-private grants access only to classes in that package, as well as subclasses of the original class, something that Scala's 'private [mypackage]' does not allow.

So, 'package [mypackage]' is both more and less restrictive that Java package-private. For both reasons, JVM package-private can't be used to implement it, and the only option that allows the uses that Scala exposes in the compiler is 'public.'

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
  • 1
    Thanks for the answer. Don't you think this is a bit dangerous for API writers ? Functionalities they never wanted to be exposed end up plainly visible from Java. I wonder if they could use some annotation trick to generate warnings on the user when they try using a member that was intended to be private – Dici Jun 11 '16 at 09:56
  • 1
    @Dici If you plan to interop with Java then yes, I definitely think it is something you have to take under consideration, especially if this exposes internals you don't want clients to invoke. Although in this particular case, you could also call the public `Broadcast.destory` method, shooting yourself in the foot equivalently. – Yuval Itzchakov Jun 11 '16 at 10:11
  • 1
    Yup, what I meant is that now that I know all Spark internals declared as package-private as exposed through the Java API, I think there should probably be more Java wrappers to hide functionalities that weren't intended to be public. My example was just to show the method was actually called. – Dici Jun 11 '16 at 10:28
  • @Dici This would perhaps be a good question for the [Spark mailing list](http://apache-spark-user-list.1001560.n3.nabble.com/) – Yuval Itzchakov Jun 11 '16 at 10:39
  • 4
    Anyone who wants to access stuff they're not supposed to could just use reflection anyway. – Antimony Jun 11 '16 at 14:17
  • 6
    @Antimony You're right, but the effort of using reflection usually makes people think it's not worth it, unlike simply invoking a method which is right there. I agree that this can be problematic. – Yuval Itzchakov Jun 11 '16 at 14:43