-1

everyone, I have questions about running spark via intelliJ IDEA. If someone can offer me a help, I will very appreciate it. Thank you so much. I googled them, I tried, but changes nothing or even made the result worse, so I just keep it the original.

I have typed some easy scala code to test spark running via intelliJ IDEA, but some errors came out. My questions are here:

1. Please have a look of the pic 1 & 2. There is 2 errors "cannot resolve symbol ===" and "value '$' is not a member of StringConext", and the details is in the pic 3.

2. If I comment the wrong code lines with "//", then the code can run, the df can be read and shown, but the code line for calculting the mean value deosn't work. Errors are shown in the pic 4 & 5.

Could anyone please help me to resolve these 2 problems. Thank you so much!!!

demo

pom.xml

error1

error2

final result

Here is my code of pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.test.demo</groupId>
    <artifactId>DemoProject</artifactId>
    <version>1.0-SNAPSHOT</version>

    <repositories>
        <repository>
            <id>apache</id>
            <url>http://maven.apache.org</url>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.3</version>
        </dependency>
    </dependencies>

</project>

Here is my code of scala case object:

import org.apache.spark.sql.SparkSession
import java.io.File
import org.apache.spark
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._

object Demo {
  def main(args: Array[String]): Unit = {

    val spark = SparkSession
      .builder()
      .master("local[*]")
      .appName("Spark SQL basic example")
      .getOrCreate()

    val peopleDFCsv = spark.read.format("csv")
      .option("sep", "\t")
      .option("header", "false")
      .load("C:\\Users\\shell\\OneDrive\\Desktop\\marks.csv")
    peopleDFCsv.printSchema()

    peopleDFCsv.show(15)


    val df = spark.read.option("inferScheme", "true").option("header", "true").csv("C:\\Users\\shell\\OneDrive\\Desktop\\marks.csv")
    df.show()
    df.withColumn("id", monotonically_increasing_id()).show
    df.join(df.groupBy("max(marks)"), $"marks" === $"max(marks)", "leftsemi").show
    df.join(df.filter("subject = maths").groupBy("max(marks)"). $"marks" === $"max(marks)", "leftsemi").show
    df.join(df.filter("subject = maths").select(mean(df("marks")))).show

    //    val a = new File("./data").listFiles()
    //    a.foreach(file => println(file.getPath))
  }
}
  • 1
    Does this answer your question? [“value $ is not a member of StringContext” - Missing Scala plugin?](https://stackoverflow.com/questions/44209756/value-is-not-a-member-of-stringcontext-missing-scala-plugin) – user10938362 Mar 03 '20 at 17:15
  • thanks for your remind, but I tried it before, it doesn't work at all, the problems are still those, so I asked it here. – MJfortruth Mar 04 '20 at 05:24
  • 1
    @MJfortruth: for your error2: https://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha – Shaido Mar 04 '20 at 10:01
  • hi, I also tried this before, I'm using win10, so I set the path on "advanced system setting", I also imported the hadoop on intelliJ, but still the same error : ( "ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path" – MJfortruth Mar 04 '20 at 16:02
  • Do I need to put the winutils into "dependencies"? – MJfortruth Mar 04 '20 at 16:03
  • @MJfortruth: Did you actually download the winutils.exe and place it in the bin folder as described in the linked answers? – Shaido Mar 05 '20 at 01:06

1 Answers1

0

So, first reason of error on join function is wrong arguments. In your first join function groupBy will return RelationalGroupedDataset witch is not Dataframe. You need aggregate something after groupBy function. Look at my example. For using $ in column names you need to import spark.implicits._. Also, you need import org.apache.spark.sql.functions._ for using standard spark column functions, look:

import spark.implicits._
import org.apache.spark.sql.functions._
import java.io.File
val df = spark.read.option("inferScheme", "true").option("header", "true").csv("C:\\Users\\shell\\OneDrive\\Desktop\\marks.csv")
df.show()
//  df.withColumn("id", monotonically_increasing_id()).show
df.join(df.groupBy("column_for_group").agg(max("marks")), $"marks" === max($"marks"), "leftsemi").show
df.join(df.filter("subject = maths").groupBy("column_for_group").agg(max("marks")), $"marks" === max($"marks"), "leftsemi").show
df.join(df.filter("subject = maths").select(mean(df("marks")))).show

val a = new File("./data").listFiles()
a.foreach(file => println(file.getPath))

Error with winutils can be resolved setting correct environment variables. You need to google something like windows 10 how set environment variable. Better to do it in system, not programmatically.

Boris Azanov
  • 4,408
  • 1
  • 15
  • 28