0

I am using an Oracle database and I'm trying to select values between parentheses. Here is my table, which has ID and Roads as columns. I have read-only access to this database so I can only use SELECT:

ID   Roads
--   -----
1    #Chaussée de Waterloo (Ixelles)#
2    #Rue Reper-Vreven (Bruxelles)#
3    #Rue des Fraises (Anderlecht)#
4    #Chaussée de Roodebeek (Woluwe-Saint-Lambert)#
5    #Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#

Basically, from the Roads column, I only want to keep the values between parentheses. As the final query has other tables in it, I want a select distinct. The desired output is:

 ID    Roads
------------------
  1      Ixelles  
  2      Bruxelles 
  3      Anderlecht
  4      Woluwe-Saint-Lambert
  5      Etterbeek, Etterbeek, Etterbeek

I tried the following query, which works fine when there is only one set of parentheses, but this doesn't work when there are several (like for ID 5), as it only gives back the values in the first set of parentheses:

select distinct substr(roads, instr(roads,'(') + 1, instr(roads,')') - instr(roads,'(') - 1) as roads 
from table

Does anyone know where I'm going wrong?

Gary_W
  • 9,933
  • 1
  • 22
  • 40
philippe
  • 151
  • 2
  • 9
  • 4
    mistake #1: not normalizing your data set. – Marc B Oct 27 '15 at 14:09
  • @Marc B What does that mean specifically? – philippe Oct 27 '15 at 14:15
  • Data normalization is primer for Relational DB – Lukasz Szozda Oct 27 '15 at 14:17
  • https://en.wikipedia.org/wiki/Database_normalization – Marc B Oct 27 '15 at 14:17
  • @lad2025 Thanks for the info, the problem is that I have read only access to the database, which means i can only do "select" queries... – philippe Oct 27 '15 at 14:19
  • 1
    @lad2025 thanks for the suggestion but it only returns the values between the first set of brackets... – philippe Oct 27 '15 at 14:35
  • Check this [SO answer](http://stackoverflow.com/a/26878365/3854195) to see if `regex_substr()` meets your needs. – Morpheus Oct 27 '15 at 14:43
  • If there are multiple sets do you need what is in all of them? Is there a minimum or maximum amount of sets? Is there at least always one? Could a set exist with nothing in it? Please update the original post with more details. Thank you. – Gary_W Oct 27 '15 at 14:55
  • @Gary_W yes if there are multiple sets I need what is in all of them. There is always at least one set but no maximum (its not limited). A set has always something in it – philippe Oct 27 '15 at 15:00
  • despite of whether data model is ok or not, you can use this one: http://stackoverflow.com/questions/17596363/oracle-11g-get-all-matched-occurences-by-a-regular-expression – przemo_pl Oct 27 '15 at 15:14
  • It would help a lot if your question included the exact output you'd like to get for your example data. – rob mayoff Oct 28 '15 at 03:26
  • @robmayoff in this case the output i want is: Ixelles Bruxelles Anderlecht Woluwe-Saint-Lambert Etterbeek, Etterbeek, Etterbeek – philippe Oct 28 '15 at 07:37

3 Answers3

0

One solution - based in part on the one referenced by przemo_pl:

SELECT SUBSTR( with_parentheses,2,length(with_parentheses)-2) between_parenthesis
FROM 
  (select REGEXP_SUBSTR(dat, '\([^()]*\)+',1,level) AS with_parentheses 
     from (select '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Anderlecht)#Avenue Le Marinel (Ixelles)#' as dat from dual )
   connect by LEVEL <= ( LENGTH(dat) - LENGTH(REPLACE(dat, '(', '')))    
  )

returns:

between_parenthesis
---------------------
"Etterbeek"
"Anderlecht"
"Ixelles"

If you want this re-assembled into a single row then that adds another wrinkle.

Michael Broughton
  • 4,045
  • 14
  • 12
0

Please see my other post for a much simpler answer. I am leaving this here as it is still an interesting approach and a lesson on how over-thinking a solution can get way too complicated and sometimes one must just step back and approach a problem a different way. :-)

Ok, you need to loop through the rows and through the parentheses within the rows:

with tbl(ID, Roads) as (
  select 1, '#Chaussée de Waterloo (Ixelles)#' from dual
  union
  select 2, '#Rue Reper-Vreven (Bruxelles)#' from dual
  union
  select 3, '#Rue des Fraises (Anderlecht)#' from dual
  union
  select 4, '#Chaussée de Roodebeek (Woluwe-Saint-Lambert)#' from dual
  union
  select 5, '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#' from dual
    )
    SELECT ID, Roads,
           COLUMN_VALUE AS match_nbr,
          REGEXP_SUBSTR( Roads ,'\(([^\)]*)\)', 1, COLUMN_VALUE, NULL, 1 ) AS match_value
   FROM   tbl,
          TABLE(
            CAST(
              MULTISET(
                SELECT LEVEL
                FROM   DUAL
                CONNECT BY LEVEL <= REGEXP_COUNT( Roads ,'\(' )
              ) AS SYS.ODCINUMBERLIST
            )
          );

Result:

enter image description here

See here for a similar post, which links to another post which provides more info. I don't claim to understand it fully. :-)

EDIT: Updated to get list of roads on one line using listagg( ):

SQL> with tbl(ID, Roads) as (
     select 1, '#Chaussée de Waterloo (Ixelles)#' from dual
     union
     select 2, '#Rue Reper-Vreven (Bruxelles)#' from dual
     union
     select 3, '#Rue des Fraises (Anderlecht)#' from dual
     union
     select 4, '#Chaussée de Roodebeek (Woluwe-Saint-Lambert)#' from dual
     union
     select 5, '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#' from dual
       )
       select id,
       listagg(match_value, ', ') within group (order by id) road_list
       from (
       SELECT ID, Roads, COLUMN_VALUE AS match_nbr,
             REGEXP_SUBSTR( Roads ,'\(([^\)]*)\)', 1, COLUMN_VALUE, NULL, 1 ) AS match_value
      FROM   tbl,
             TABLE(
               CAST(
                 MULTISET(
                   SELECT LEVEL
                   FROM   DUAL
                   CONNECT BY LEVEL <= REGEXP_COUNT( Roads ,'\(' )
                 ) AS SYS.ODCINUMBERLIST
               )
             )
       )
       group by id
       order by id
       ;

        ID ROAD_LIST
---------- --------------------------------------------------
         1 Ixelles
         2 Bruxelles
         3 Anderlecht
         4 Woluwe-Saint-Lambert
         5 Etterbeek, Etterbeek, Etterbeek

SQL>
Community
  • 1
  • 1
Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • thanks, that indeed works but in myspecific case, i also need to use a select distinct, as another column in my table needs unique values... – philippe Oct 27 '15 at 16:03
  • Nothing about that was mentioned in the original specs. – Gary_W Oct 27 '15 at 18:09
  • Do you mean the street like Etterbeek should not be repeated? – Gary_W Oct 27 '15 at 18:57
  • sorry indeed this query is only a small part of a much bigger query. Indeed I only want distinct values. Etterbeek should be repeated but on the same line if that makes sence – philippe Oct 28 '15 at 07:40
  • @philippe Ok, I used listagg( ) to do it. See updated post. – Gary_W Oct 28 '15 at 14:03
  • Yes, there has got to ba an easier way, it must be possible using substring and instring – philippe Oct 28 '15 at 14:30
  • Yikes that will be a maintenance nightmare (nested instr(), substr() are just awful to work with) plus how to handle a multiple roads where the number is unknown? I just had a brainstorm! See new post in a few minutes... – Gary_W Oct 28 '15 at 14:39
0

I am adding this as a new answer as it is so different from my previous one, which was a classic example of over-complicated thinking getting worse with each iteration and thus is still a good example of that! lol Sometimes you just need to feel when you are getting too complicated and don't be afraid to start over on a different tack!

Ok, check this out. I went back to square one and studied the string for a pattern. Whether one road or more (actually this design of multiple values in one column violates basic data modeling tenets and should be reworked but who hasn't had to deal with a crappy design that we have no control over?), each road is surrounded by pound signs. My thought was to loop through the string using a regular expression, replacing the pattern of a pound sign through the closing parentheses with what is inside the parentheses. Granted this leaves a pound sign at the end but we'll clean that up later. Note that REGEXP_REPLACE will replace all occurrences of the pattern if found so by default loops through all roads and is WAY easier to maintain then a fugly mass of nested INSTR(), SUBSTR() :

SQL> with tbl(ID, Roads) as (
     select 1, '#Chaussée de Waterloo (Ixelles)#' from dual
     union
     select 2, '#Rue Reper-Vreven (Bruxelles)#' from dual
     union
     select 3, '#Rue des Fraises (Anderlecht)#' from dual
     union
     select 4, '#Chaussée de Roodebeek (Woluwe-Saint-Lambert)#' from dual
     union
     select 5, '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#' from dual
   )
   select ID, rtrim(regexp_replace(Roads, '#.+?\((.+)\)', '\1, '), ', #') Roads
   from tbl;

        ID ROADS
---------- ----------------------------------------
         1 Ixelles
         2 Bruxelles
         3 Anderlecht
         4 Woluwe-Saint-Lambert
         5 Etterbeek, Etterbeek, Etterbeek

SQL>

The regular expression pattern explained:

#    Look for a literal pound sign
.    followed by any character
+    followed by one or more of the previous character (any character)
?    make the previous character optional (one or more any characters)
\(   a literal left paren
(    start remembered group 1
.    any character
+    one or more "any" characters
)    end remembered group 1
\)   followed by a literal closing right paren

If the above string is found, replace with the "replace-with" string:

\1       The first remembered group which is what is inside the parentheses.
,<space> followed by a comma and a space

Then its a tad quick and dirty, but just use RTRIM to remove the trailing comma-space-pound sign. Wha-la! Whew.

Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • Yes that does work, thanks a lot, indeed sometimes its better to go back to somthing easier. – philippe Oct 28 '15 at 15:05
  • If this worked as the solution for you please mark it to show future searchers what worked. Thanks this was interesting. – Gary_W Oct 28 '15 at 15:11
  • I did but as I am new to this website and have not enough reputations, it doesnt appear publically. But thanks for your help again. – philippe Oct 28 '15 at 15:33
  • Thansk again for the answer @gary_w, but I don'y really understand the syntax for the regular expression – philippe Oct 29 '15 at 10:01
  • Regular expressions are a language unto itself. There is a basic core common to all flavors with some functionality dependent on the version or OS/tool you are using. Since this is Oracle related start here but once you learn it it is very pwerful. Do some searching too there are lots of resources on learning regex out there. Good luck! http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_regexp.htm – Gary_W Oct 29 '15 at 13:28