1

I'm trying to compute HIVE table statistic from Apache Spark:

`sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS')`

I also execute statement to see what was collected:

sqlCtx.sql('DESC FORMATTED t1')

I can see my stats was collected. However when I execute same staement in HIVE client (Ambari) - there are no statistics displayed. Is it available only to Spark if it's collected by Spark? Does spark store it somewhere else?

Another question.

I also computing stats for all columns in that table:

sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS FOR COLUMNS c1,c2')

But when I want to see this stats in spark, it failed with unsupported sql statement exception:

sqlCtx.sql('DESC FORMATTED t1 c1')

According to docs it's valid hive queries. What is wrong with it?

Thanks for help.

vvg
  • 6,325
  • 19
  • 36
  • 1
    Likely because Spark with Hive support enabled is not a Hive client. It is Spark which happens to "know" how to use Hive metastore and how to call Hive functions. – Alper t. Turker Apr 13 '18 at 16:48

2 Answers2

2

Apache Spark stores statistics as "Table parameters". To be able retrieve these stats, we need to connect to HIVE metastore and . execute query like following

select param_key, param_value 
from table_params tp, tbls t 
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>' 
and param_key like 'spark.sql.stat%';
vvg
  • 6,325
  • 19
  • 36
0

just uppercase the name of table will be ok.

select param_key, param_value 
from TABLE_PARAMS tp, TBLS t 
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>' 
and param_key like 'spark.sql.stat%';
starqiu
  • 1
  • 1