2

Some distributed computing engines such as Spark or Flink are able to distribute code between computers and jvm, such as (in scala with spark):

sc.parallelize(1 to 10).map(i => i+1).collect

Here, the i => i+1 is serialized, send and executed on all worker. I would like to know how this is done?

Also I'd appreciate if anyone can point me to the source code (classes) that are related to this issue in some existing distributed-computing framework such as Spark/Flink

Juh_
  • 14,628
  • 8
  • 59
  • 92
  • 1
    From what I could find, RMI might be the tool that is provided by Java – XtremeBaumer Apr 03 '18 at 14:18
  • using "RMI" in my search I found this question https://stackoverflow.com/q/36461299/1206998 which answer has a comment saying RMI is not used (obsolete and too slow) :-\ – Juh_ Apr 03 '18 at 14:55
  • There are a couple of close votes. Can you comment please? I tried to be as precise as I can and I just added an possible practical example of what I expect is done. But I simply don't know how it works so I might not have the proper words – Juh_ Apr 03 '18 at 15:21
  • I edited the question to make it more precise – Juh_ Apr 04 '18 at 13:59

1 Answers1

1

Edit: this answer was done before an update of the question which was marked as "too broad".

The code is loaded from its classes and the classes are loaded via the ClassLoader, Every time you create a thread you can set your new classLoader before you start it.

Given this abilities, you can

  • Simply download jars and create a new UrlClassLoader accessing theses jars on the disk.
  • Create your custom classloader to load specific class at runtime (from network or other ...)
  • Use any technology that permits hot code loading: OSGI is one of them.

The classLoaders are hierarchic, if a class is not found by yout classLoader it is asked to its parent. Here is the default hierarchy:

  • bootStrap classLoader
  • extension ClassLoader
  • System classpath classloader. system classpath

Application server like tomcat glassfish or wildfly add a ClassLoader for every EAR or WAR that is loaded, permitting a dynamic loading of applications.

Juh_
  • 14,628
  • 8
  • 59
  • 92
pdem
  • 3,880
  • 1
  • 24
  • 38
  • You explain how to distribute compiled code/jar. But this is not what I meant. What I am looking for is how to distribute "live" code. Something like `String code = serialize(new MyTrait{ ... })`, send it, and then `MyTrait t = deserialize(code)` on the other jvm – Juh_ Apr 03 '18 at 14:43
  • I don't understand, you want hot compilation or distubute local compiled class? a loaded class has bytecode and you need that byte code to create a class. There are also bytecode generators but is is a more general problem – pdem Apr 03 '18 at 14:47
  • I am not sure which word to use, but I guess distributing "local compiled classes" would be a alright. To start, distributing a "locally compiled function" would be nice – Juh_ Apr 03 '18 at 14:53
  • There are no function concept in java, only Class that contains method, and that cannot be separated. and dont forget that all is compiled. So all the question is what you want to send over the network? the data?=> classic serilisation (JSON, binary , other), the source?=> need compilation, the compiled Class?=> the process that I already explained – pdem Apr 03 '18 at 14:58
  • To put it simply, in spark and writing in scala, you can do `sc.parallelize(1 to 1000).map(i => i+1).collect`. `parallelize` will distribute the data and the map will send the function `i => i+1` and execute it on the distributed data. So on other jvms. How it does the code distribution is what I would like to understand. And for this example it don't distribute jar – Juh_ Apr 03 '18 at 15:16
  • ok, If this is java code, there is bytecode (the compiled java) that is sent for you, with classloader overriding, this is not a simple operation. And the object you pass to map method is an object implementing a lambda (This is not the simplest class in java) If you want ot do some research about it, thing about sending your class over the network (not necesserarily all the jar) you need to manipulate bytecode see: https://stackoverflow.com/questions/17321039/get-bytecode-from-loaded-class – pdem Apr 03 '18 at 15:27
  • Is a Java source mandatory for you? What could be simple is using a interpreted language such as groovy or javascript, and it can be called from java. – pdem Apr 03 '18 at 15:28