I have following questions on statistics collections on tables in Apache Spark
- Where does all the stats collected gets stored?. In the Metastore?
- In system where Spark and Hive shares a metastore, does the stats collected on a hive table by a hive application will be made available to the Spark optimizer?. Similarly does the stats collected by Spark on a hive table will be made available to Hive optimizer?
- Is it possible to force Spark to collect stats on a Dataframe loaded in memory or collect stats on a Temporary table created from a Dataframe?