0

I use this simple code to calculate recommendation from command line java app:

    SparkSession spark = SparkSession
            .builder()
            .appName("SomeAppName")
            .config("spark.master", "local")
            .config("spark.executor.instances",1) // ??
            .config("spark.executor.cores",3) // ??
            .getOrCreate();
    JavaRDD<Rating> ratingsRDD = spark
            .read().textFile(args[0]).javaRDD()
            .map(Rating::parseRating);
    Dataset<Row> ratings = spark.createDataFrame(ratingsRDD, Rating.class);
    ALS als = new ALS()
            .setMaxIter(1)
            .setRegParam(0.01)
            .setUserCol("userId")
            .setItemCol("movieId")
            .setRatingCol("rating");
    ALSModel model = als.fit(ratings);
    model.setColdStartStrategy("drop");
    Dataset<Row> rowDataset = model.recommendForAllUsers(50);

But this code uses just 100% of CPU (800% CPU usage observed with other apps), how can I increase number of threads correctly?

Stepan Yakovenko
  • 8,670
  • 28
  • 113
  • 206

0 Answers0