everyone, I have questions about running spark via intelliJ IDEA. If someone can offer me a help, I will very appreciate it. Thank you so much. I googled them, I tried, but changes nothing or even made the result worse, so I just keep it the original.
I have typed some easy scala code to test spark running via intelliJ IDEA, but some errors came out. My questions are here:
1. Please have a look of the pic 1 & 2. There is 2 errors "cannot resolve symbol ===" and "value '$' is not a member of StringConext", and the details is in the pic 3.
2. If I comment the wrong code lines with "//", then the code can run, the df can be read and shown, but the code line for calculting the mean value deosn't work. Errors are shown in the pic 4 & 5.
Could anyone please help me to resolve these 2 problems. Thank you so much!!!
Here is my code of pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.test.demo</groupId>
<artifactId>DemoProject</artifactId>
<version>1.0-SNAPSHOT</version>
<repositories>
<repository>
<id>apache</id>
<url>http://maven.apache.org</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
</project>
Here is my code of scala case object:
import org.apache.spark.sql.SparkSession
import java.io.File
import org.apache.spark
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
object Demo {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.master("local[*]")
.appName("Spark SQL basic example")
.getOrCreate()
val peopleDFCsv = spark.read.format("csv")
.option("sep", "\t")
.option("header", "false")
.load("C:\\Users\\shell\\OneDrive\\Desktop\\marks.csv")
peopleDFCsv.printSchema()
peopleDFCsv.show(15)
val df = spark.read.option("inferScheme", "true").option("header", "true").csv("C:\\Users\\shell\\OneDrive\\Desktop\\marks.csv")
df.show()
df.withColumn("id", monotonically_increasing_id()).show
df.join(df.groupBy("max(marks)"), $"marks" === $"max(marks)", "leftsemi").show
df.join(df.filter("subject = maths").groupBy("max(marks)"). $"marks" === $"max(marks)", "leftsemi").show
df.join(df.filter("subject = maths").select(mean(df("marks")))).show
// val a = new File("./data").listFiles()
// a.foreach(file => println(file.getPath))
}
}