I have a rdd which is distributed accross multiple machines in a spark environment. I would like to execute a function on each worker machine on this rdd. I do not want to collect the rdd and then execute a function on the driver. The function should be executed seperately on each executors for their own rdd. How can I do that
Update (adding code) I am running all this in spark shell
import org.apache.spark.sql.cassandra.CassandraSQLContext
import java.util.Properties
val cc = new CassandraSQLContext(sc)
val rdd = cc.sql("select * from sams.events where appname = 'test'");
val df = rdd.select("appname", "assetname");
Here I have a df with 400 rows. I need to save this df to sql server table. When I try to use df.write method it gives me errors which I have posted in a separate thread spark dataframe not appending to the table
I can open a driverManager conection and insert rows but that will be done in the driver module of spark
import java.sql._
import com.microsoft.sqlserver.jdbc.SQLServerDriver
// create a Statement from the connection
Statement statement = conn.createStatement();
// insert the data
statement.executeUpdate("INSERT INTO Customers " + "VALUES (1001, 'Simpson', 'Mr.', 'Springfield', 2001)");
String connectionUrl = "jdbc:sqlserver://localhost:1433;" +
"databaseName=AdventureWorks;user=MyUserName;password=*****;";
Connection con = DriverManager.getConnection(connectionUrl);
I need to do this writing in the executor machine. How can I achieve this?