-1

i have following data set. i am trying to match string in Col = "Str_B_Class" to the entire string in col = "Str_A". The str_A col has data as class:value. if the class is found in Str_A, then sql should get the value after colon into the desired_output_column , otherwise 0

The problem with below code is that if its lets say found 3132, and the value 10, then instead if returning 1 row where 3132 is matched , it is returning it 6 times. so it is exploding the whole dataset. I need some help in matching the string without exploding the whole table.

Customer_ID ||                   Str_A                     || Str_B_Class || Desired_Output
-------------------------------------------------------------------------------------------    
    A1      || 121:8|188:8|3123:10|3125:10|3131:10|3132:10 || 3132        || 10
    A1      || 121:8|188:8|3123:10|3125:10|3131:10|3132:10 || 3125        || 10
    A1      || 121:8|188:8|3123:10|3125:10|3131:10|3132:10 || 4141        || 0

Query:

select 
    s.Customer_Id, 
    s.Str_A,  
    s.Str_B_Class, 
    case when s.instance_id = s.Str_B_Class 
        then s.Count_of_instances 
        else '0' 
    end AS Desired_Output 
from ( 
    select 
        Customer_Id, Str_A,  
        Str_B_Class, 
        explode(str_to_map(Str_A,'[|]','[:]')) as (instance_id, Count_of_instances) 
    from my_table 
) as s
damientseng
  • 533
  • 2
  • 19
Samy
  • 47
  • 7
  • In your sample data, the number of columns in the header (3) does not match the number of columns in the data rows. Hence it is hard to understand what you mean. – GMB Jan 06 '20 at 22:56
  • Tag your database please – Radagast Jan 06 '20 at 23:12
  • I am adding a new column "Desired_Output" after using the case statement to check if strings matched. – Samy Jan 06 '20 at 23:22
  • Also, what i am doing in inner query is exploding the data to find values after ":". Is there a better way to do this? this data set spans to ~3M rows and i am not sure how the explode will effect the performance – Samy Jan 06 '20 at 23:26
  • @Samy: you should really consider fixing your data model: rather than storing CSV list in database columns, you should have a separate table with on row per CSV element, and a foreign key that refers to the parent table. Here is [a recommended reading for the motivations](https://stackoverflow.com/q/3653462/10676716). – GMB Jan 06 '20 at 23:31
  • @Samy Would this help? https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=3714b8fffd61bd118866ea5817b444bc – Radagast Jan 07 '20 at 01:51
  • Thanks Gandalf, but the charindex() is not working in my hive environment. I am new to regexp() but is it possible to achieve this using regexp() – Samy Jan 07 '20 at 03:30
  • @Samy I standardized it a little bit. Try again with this one.https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=2fe8a87431a43060ccfc5288276e82f9 – Radagast Jan 07 '20 at 06:36

2 Answers2

1

Use Str_B_Class as a parameter to index the map generated by STR_TO_MAP:

SELECT
    Customer_Id, Str_A, Str_B_Class, 
    NVL(STR_TO_MAP(Str_A,'[|]','[:]')[Str_B_Class], '0') as Desired_Output 
FROM my_table

Have a nice day :)

damientseng
  • 533
  • 2
  • 19
  • This worked !! Thank you so much gaccio!if you get chance can you please also explain how the indexing logic work? It will help me understand this better. – Samy Jan 07 '20 at 06:58
  • The result of `STR_TO_MAP` is a map of type `map`. You can think of it as a dictionary that offers a value when given a key, if the key exists. If not, a default value of `null` is returned. And those `null`s are converted to `0` by `NVL`. – damientseng Jan 07 '20 at 07:49
0

In hive, explode is a UDTF that can generate multiple rows for a single row. While for your case, a one-to-one relation is required.