After reading some technical articles, it is said that dataframe only knows the name of the column but not the type. However, after calling the `printSchema function of dataframe in person, the name and type of the column can be printed out. I am very doubtful about this. I am looking forward to your answer.
example:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
case class Person(name:String,age:Long)
object DS {
def main(args: Array[String]): Unit = {
val config = new SparkConf().setAppName("sparkSql").setMaster("local[*]")
val sc = new SparkContext(config)
val spark = SparkSession.builder().config(config).getOrCreate()
val seq = Seq(("aa",1),("bb",2))
import spark.implicits._
val rdd = sc.makeRDD(seq)
val df = rdd.toDF("name","age")
val ds = rdd.map(line =>{Person(line._1,line._2)}).toDS()
println("dataframe schema:")
df.printSchema()
/*
dataframe schema:
root
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
*/
println("dataset schema:")
ds.printSchema()
/*
dataset schema:
root
|-- name: string (nullable = true)
|-- age: long (nullable = true)
*/
}
}
For this example,age type of dataframe schema is integer ,age type of dataset schema is long , age type of class Person is long .