I would like to read a text file directly to dataframe. Not file->rdd->dataframe. Is that possible? I Have read a lot but I cannot make it (read) is not working.
While read it I want to select specific headers from it.
Is there any fast solution to this?
Also what imports should I make?
This is my scala file
import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql._
object LoadData {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Spark Job for Loading Data").setMaster("local[*]") // local[*] will access all core of your machine
val sc = new SparkContext(conf) // Create Spark Context
// Load local file data
val rdd = sc.textFile("src/main/resources/data.txt")
val df = rdd.toDF()
// Read the records
println(rdd.foreach(println))
}
}
And my sbt
name := "HelloScala"
version := "1.0"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies ++= Seq(
// https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11
"org.apache.spark" %% "spark-core" % "2.3.2",
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.11
"org.apache.spark" %% "spark-sql" % "2.3.2"
)
I have the error Error:(16, 18) value toDF is not a member of org.apache.spark.rdd.RDD[String] val df = rdd.toDF()
Thank you very much