ANALYZE TABLE showing NULLs for all statistics in Spark

Question

I was trying to compute statistics and get the statistics for individual columns. And I'm seeing that all the statistics are NULL here for all the columns. Not sure what mistake I may be doing here.

ordersSchemaDDL = "orderid Int, ordertime Timestamp, custid Int, Status String"

orders_df = spark.read \
    .format("csv") \
    .option("header",True) \
    .schema(ordersSchemaDDL) \
    .option("mode","DROPMALFORMED") \
.option("path","orders.csv") \
.load()

spark.sql("create database if not exists saveAsTable")

spark.sql("ANALYZE TABLE saveAsTable.orders_bucketed COMPUTE STATISTICS;")
spark.sql("DESCRIBE EXTENDED saveAsTable.orders_bucketed orderid;").show(truncate=False)

Orders Table: As we can see it has lot of data

 +++++
    orderid          ordertimecustid         Status
    +++++
          120130725 00:00:00 11599         CLOSED
          220130725 00:00:00   256PENDING_PAYMENT
          320130725 00:00:00 12111       COMPLETE
          420130725 00:00:00  8827         CLOSED
          520130725 00:00:00 11318       COMPLETE
          620130725 00:00:00  7130       COMPLETE
    



  Statistics Output:
   info_name     info_value

    col_name       orderid   
    data_type      int       
    comment        NULL      
    min            NULL      
    max            NULL      
    num_nulls      NULL      
    distinct_count NULL      
    avg_col_len    NULL      
    max_col_len    NULL      
    histogram      NULL

ANALYZE TABLE showing NULLs for all statistics in Spark

0 Answers0