There are a lot of materials on the internet that are comparing RDD and DataFrame, but almost all of them only mentioned that Dataframe can perform additional optimizations because dataframe has additional schema information.
But almost all of them didn't explain what is means that Dataframe has additional schema information that RDD doesn't have, and how Dataframe technically works that brings this performance enhancement.
Say, we have an RDD[Person], Doesn't the RDD knows its element type is Person?and I don't think RDD[Person] has less information than DataFrame(which is Dataset[Person]) that will be used to perform optimization.