I am creating an external Hive table on a parquet file on S3. The commands look like
CREATE EXTERNAL TABLE userinfo(
user_id string,
last_name string,
first_name string
)
PARTITIONED BY (
yr string,
mo string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://mybucket/basedir/'
TBLPROPERTIES (
'serialization.null.format'='');
alter table userinfo add IF NOT EXISTS partition (yr='2021', mo='07');
At this point, if I run "select count(*) from userinfo", I get 0 as the result. But if I then run
ANALYZE TABLE userinfo PARTITION(yr='2021', mo='07') COMPUTE STATISTICS;
and rerun the "select count(*)..." I get the expected row count.
This isn't a show-stopper, but it makes me think I'm doing something/failing to do something that's causing this strange behavior. Any insights are welcome.