1

is there any way to replace digits using substring in Hive.

scenario:

1. I need to check 0's from right to left and as soon as i find 0 before any digit then i need to replace all the trailing 0's by 9.
2. In output at-least 3  digit should be there before 9.
3. In Input if 2 or less digits are available in input then I need to skip some 0's and make sure that at-least 3 digits are there before 9.
4. If more than 3 digits are available before trailing 0's then only need to replace 0.No need to replace digits.

see the below table

input     output

123000    123999
120000    120999
123400    123499
101010    101019

I have tried using below query, and it is working as expected.(Hive Join with CTE)

with mytable as (
select '123000' as input 
union all 
select '120000' as input 
union all 
select '123400' as input 
union all 
select '101010' as input
) 
select input,lpad(concat(splitted[0], translate(splitted[1],'0','9')),6,0) as output 
from (
select input, split(regexp_replace(input,'(\\d{3,}?)(0+)$','$1|$2'),'\\|') splitted from mytable )s;

but In my actual query which is more than 500+ lines,it is very difficult to adjust this logic (with CTE) for the sigle column. so wondering if is there any way to achieve the same using only lpad/rpad and substring/length and can achieve by adding the functions without using CTE queries.

so let say if length of digits before trailing 0's is less than 6 then can skip the substring 
from (input,1,6) and will replace the remaining 0's and if  length of digits before trailing 0's is  6 
or more then 6 then just keep digits as it is and replace remaining trailing 0's by 9.

Kindly Suggest.

My Actual Query Looks like.

with mytable as
(
select lpad(input,13,9) as output from mytable where code='00'
union
select output  from mytable where code='01'
)
select t1.*,m1.output from table1 t1 , mytable m1 where  
(t1.card='00' and substr(t1.low,1,13)<=m1.low and m1.output <= substr(t1.output,1,13) and m1.card='00' )
or
(t1.card='01' and substr(t1.low,1,16)<=m1.low and m1.output <=  substr(t1.output,1,16) and m1.card='01' )

I want to Replace above logic for 2nd output where code=01 in union query.

Sonu
  • 77
  • 11
  • The only CTE in your query is `mytable`, which is a demo datase and I suppose you can use your real table instead of CTE. So it is not clear what is a problem with CTE, just remove CTE, replace mytable in FROM with your table name – leftjoin Mar 10 '22 at 12:34
  • with mytable as (select lpad(token_bin_h,13,9) as output from mytable where code='00' union select output from mytable where code='01' ) select t1.*,.... //Will write Query Logic here and will use above output column here for Implementation. @leftjoin my actual query looks like this. And I need to put this logic for output column where code='01' can You Suggest How should I put this logic instead of Output in second select after union where code='01' – Sonu Mar 10 '22 at 13:02
  • Please write your actual query in question and describe the problem in a clear way, sorry I could not get it – leftjoin Mar 10 '22 at 13:10
  • done ...Updated In Question.@leftjoin – Sonu Mar 10 '22 at 13:50
  • Why ere you using UNION, not UNION ALL? Because you want DISTINCT records? – leftjoin Mar 10 '22 at 14:23
  • Yes @leftjoin. to avoid duplicates. I have uploaded i my actual query in question – Sonu Mar 10 '22 at 14:25

2 Answers2

0

Instead of UNION you can do the same using single query:

Instead of this

select lpad(output,13,9) as output from mytable where code='00'
union
select output  from mytable where code='01'

use this

select distinct
       case code when '00' then lpad(output,13,9)
                 when '01' then output  
        end output  
  from mytable 

If you need to filter codes, add where code in ('00','01')

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • As I mentioned in my question I need to add the logic of output filed in this query and I am Trying to add that logic in this query.. – Sonu Mar 10 '22 at 14:47
  • with mytable as ( select input ) select input,lpad(concat(splitted[0], translate(splitted[1],'0','9')),16,0) as output from ( select input, split(regexp_replace(input,'(\\d{*?}?)(0+)$','$1|$2'),'\\|') splitted from mytable )s; – Sonu Mar 10 '22 at 14:47
  • this is the logic for that field and I want to use the same in the query for output field – Sonu Mar 10 '22 at 14:48
  • I think lot of confusion here. So I have Created new Thread and mention only Required Details with actual Query and the logic which need to update. https://stackoverflow.com/questions/71426337/replace-digits-using-substring-and-lpad-rpad-in-hive Kindly help here . @leftJoin – Sonu Mar 10 '22 at 15:06
  • @Sonu in your new question there is still no input data example and desired output data example. The confusion is not resolved. Try to ask question in a clear and simple way. What are you trying to do with that code is not clear at all and nobody wants to analyze that code and requirements. You cold make it much simpler. Just add input data ad output data for all cases, describe which columns do you need as output and what you have in the table – leftjoin Mar 10 '22 at 17:38
  • Noted. I have tried to simplify the question, added expected Result reduce query size and kept only requested Column logic details. kindly let me know if any Information is required. I have Updated the question. https://stackoverflow.com/questions/71426337/replace-digits-using-substring-and-lpad-rpad-in-hive @LeftJoin – Sonu Mar 10 '22 at 19:01
0

It Worked Now.I modified My query as below.

with mytable as
(
select lpad(input,13,9) as output from mytable where code='00'
union
select lpad(concat(split(regexp_replace('(\\d{6,}?)(0+)$','$1|$2'),'\\|') [0], 
translate(split(regexp_replace(input,'(\\d{6,}?)(0+)$','$1|$2'),'\\|')[1],'0','9')),16,0 )
output  from mytable where code='01'
)
select t1.*,m1.output from table1 t1 , mytable m1 where  
(t1.card='00' and substr(t1.low,1,13)<=m1.low 
 and m1.output <= substr(t1.output,1,13) and m1.card='00' )
 or
(t1.card='01' and substr(t1.low,1,16)<=m1.low 
and output <=  substr(output,1,16) and m1.card='01')
Sonu
  • 77
  • 11