0

My query

insert into table hist
Select subs2.SUBS_KEY
, subs2.FACT_DATE
, subs2.TYPE
, subs2.BALANCE
, '${DATE_1D}' as TIME_KEY
from subs2
where CNT < 15 and AMT > 0;

where '${DATE_1D}' is hivevar DATE_1D=2017-02-28

after it is executed number of rows goes from

db.hist stats: [numFiles=22168, numRows=254582570, totalSize=19188669803, rawDataSize=19613070967]

to

db.hist stats: [numFiles=22268, numRows=257376901, totalSize=19386949977, rawDataSize=19808556810]

difference = 2794331 rows

But when I do

insert into table db.new_hist
select *
from db.hist
where time_key = '2017-02-28';

I get

db.new_hist stats: [numFiles=201, numRows=178798, totalSize=2227208, rawDataSize=12510497]

I created new_hist using create table like

result of show create

CREATE TABLE `db.hist`(
  `subs_key` string,
  `fact_date` string,
  `type` int,
  `balance` double,
  `time_key` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

Why does the number of rows not match?

I did a

Select count(*)
from hist

with and without

set hive.compute.query.using.stats=true;

result in both cases = 246210052

for some reason logs for insert showed the wrong number of rows.

gjin
  • 860
  • 1
  • 14
  • 28

0 Answers0