0

Maybe some of you know how to solve my problem (Hana SQL or Python):

I have a column like this:

id reason
1 1;2;3;5;6;7;8;10;13
2 1;2;4;5;6
3 4;7;8;9
4 1;2;3;10;11;12;13
5 2;4

And I would like to split this column into several. But not just by delimiter, in this case it's - ';'. I need to split it to 13 different columns (max number in chain), and for each number, check if it is in a chain and if yes assign it '1', if not '0'.

As the result I would like to see a matrix like this:

id reason 1 reason 2 reason 3 reason 4 ... reason 12 reason 13
1 1 1 1 0 ... 0 1
2 1 1 0 1 ... 0 0
3 0 0 0 1 ... 0 0
4 1 1 1 0 ... 1 1
5 0 1 0 1 ... 0 0

Thanks a lot in advance for your help!

I've tried many functions (for example SUBSTRING_REGEXPR Function), but it always divides chain only by some specific delimiter.

  • please take a look at https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad and normalize your table – nbk Apr 05 '22 at 14:15
  • Thanks for your comment. But it's field from SAP, so unfortunately I'm not able to modify it. :( So I have to work with it in this way. – Julia Lukach Apr 05 '22 at 14:18
  • In what form do you get the data in python? – Banana Apr 05 '22 at 14:19
  • This table is available only in Hana Studio. But I can work with in directly in Hana or in JupiterLab via python. That's why I can solve this in both ways. – Julia Lukach Apr 05 '22 at 14:28
  • @JuliaLukach see the documentation https://blogs.sap.com/2019/02/10/split-string-into-multiple-rows-using-sql-in-sap-hana/ and for python ueds pandas with that and columsn you will find a lot of example – nbk Apr 05 '22 at 14:51

1 Answers1

0

You can use HANA's string functions to search for patterns using regular expressions. In this case, you may want to search for the pattern, where your respective reason code is surrounded by either semicolons or it's on the end of beginning of the line.

The regex to find reason code "1" should be something like this:

(^|;)1(;|^)

You can use it in a query like this:

SELECT 
    id,
    SIGN(OCCURRENCES_REGEXPR('(^|;)1(;|^)' IN reason)) AS reason1,
    SIGN(OCCURRENCES_REGEXPR('(^|;)2(;|^)' IN reason)) AS reason2,
    ...
    SIGN(OCCURRENCES_REGEXPR('(^|;)13(;|^)' IN reason)) AS reason13,    
FROM yourtable;

Note, that I've used function SIGN to make the result binary.

Mathias Kemeter
  • 933
  • 2
  • 11
  • Hello @Mathias, thanks for a good solution. But it seems it always forgets to mark the last number in the string (or when it's the only one). I suppose it's because of missing semicolon in a string. Do you know how it can be solved? – Julia Lukach Jul 26 '22 at 09:28
  • The regex say either semicolon or line end. I have to admit, I am not an expert on regular expressions. Is it possible that there is a trailing space or something similar? – Mathias Kemeter Jul 26 '22 at 10:12
  • No, I think there are no trailing spaces or smth like this. I've added CONCAT function: IN CONCAT(reason,';'). It's not the most elegant solution at all, but it works now. Thanks – Julia Lukach Jul 26 '22 at 11:19