My query
insert into table hist
Select subs2.SUBS_KEY
, subs2.FACT_DATE
, subs2.TYPE
, subs2.BALANCE
, '${DATE_1D}' as TIME_KEY
from subs2
where CNT < 15 and AMT > 0;
where '${DATE_1D}'
is hivevar DATE_1D=2017-02-28
after it is executed number of rows goes from
db.hist stats: [numFiles=22168, numRows=254582570, totalSize=19188669803, rawDataSize=19613070967]
to
db.hist stats: [numFiles=22268, numRows=257376901, totalSize=19386949977, rawDataSize=19808556810]
difference = 2794331 rows
But when I do
insert into table db.new_hist
select *
from db.hist
where time_key = '2017-02-28';
I get
db.new_hist stats: [numFiles=201, numRows=178798, totalSize=2227208, rawDataSize=12510497]
I created new_hist
using create table like
result of show create
CREATE TABLE `db.hist`(
`subs_key` string,
`fact_date` string,
`type` int,
`balance` double,
`time_key` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
Why does the number of rows not match?
I did a
Select count(*)
from hist
with and without
set hive.compute.query.using.stats=true;
result in both cases = 246210052
for some reason logs for insert showed the wrong number of rows.