3

I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps, but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query.

When I try to run below query it is not displaying 1000 previous date values from current date.

Example: let's say only 2 dates are available for date column

date      
2019-01-16 
2019-01-19

I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing

SELECT date FROM  table1 t
WHERE 
date >= date_sub(current_date,1000) and  dt<current_date ORDER BY date LIMIT 1

If I run above query it is displaying 2019-01-16, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16 but I need missing dates starting from 2016-04-23 (1000th date from current date) till before current date (2019-01-18) as output of my query.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
user2883028
  • 173
  • 6
  • 19

1 Answers1

2

You can generate dates for required range in the subquery (see date_range subquery in the example below) and left join it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range subquery without gaps. Set start_date and end_date parameters for date_range required:

set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date

set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory

with date_range as 
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
) 

select d.dt as date,
       t.your_col --some value from your table on date
  from date_range d 
       left join table1 t on d.dt=t.date 
order by d.dt --order by dates if necessary
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thank you.Above query might work as we know start date in advance . But in my case start date comes from same table and it changes every day as the job runs . For ex:- 1000 th day from today is 2016-04-23 but in tomorrow's run it might be 2016-04-24 and next days run it might be 2016-04-25 ..we are restricted to use this in one query and no intermediate tables can be created in our customized framewor so finding bit challenge here . – user2883028 Jan 19 '19 at 14:26
  • @user2883028 This is very easy to achieve. Customize parameters, or substitute them directly in the subquery. For example `date_sub(current_date,1000)` instead of `'${hivevar:start_date}'`. Just edit accordingly. – leftjoin Jan 19 '19 at 14:30
  • I could able to make changes and got the output , still it is exploding from 2017-01-02 since in my joining table currency does not have dates beyond that . but still as per your query it should do left outer join and bring all dates starting from 1000th previous date 2016-04-23 till 2019-01-18 .below is the query i tried – user2883028 Jan 19 '19 at 16:04
  • with date_range as (--this query generates date range, check it's output select date_add (date_sub(current_date,1000) ,s.i) as dt from ( select posexplode(split(space(datediff(current_date,date_sub(current_date,1000))),' ')) as (i,x) ) s ) select d.dt as date1, t.perioddt --some value from your table on date from date_range d left join currencytable t on d.dt=t.perioddt – user2883028 Jan 19 '19 at 16:04