0

I am trying to get each different value of string between delimiters in MYSQL. I tried using function SUBSTRING_INDEX, it works for the first string and the continuation of that first string, but not the second string. Here is what I meant:

Table x                    The result

enter image description here

SELECT SUBSTRING_INDEX(path, ':', 2) as p, sum(count) as N From x Group by p UNION
SELECT SUBSTRING_INDEX(path, ':', 3) as p, sum(count) From x Group by p UNION
SELECT SUBSTRING_INDEX(path, ':', 4) as p, sum(count) From x Group by p UNION
SELECT SUBSTRING_INDEX(path, ':', 5) as p, sum(count) From x Group by p UNION
SELECT SUBSTRING_INDEX(path, ':', 6) as p, sum(count) From x Group by p;

I tried adding SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, ':', 2), ':', 2) as p, sum(count) From x Group by p UNION SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, ':', 4), ':', 2) as p, sum(count) From x Group by p in the query, but the result is still the same. What I am trying to do is get the result of not only string A1, A2, A3 combination, but also string with B2, C2, D2 as the first string fetched like in table below:

+---------------+----+
|   p           |  N |
+---------------+----+
| :A1           | 4  |
| ...           | ...|
| :B1           | 3  |
| :B1:C2        | 2  |
|...            | ...|
+---------------+----+

What's the correct function to get the result like that? Any help is appreciated, thank you.

Jess Moi
  • 49
  • 1
  • 8

1 Answers1

1

assumes all string nodes on path are two characters long and all paths are the same length..

plan

  • creates a sequence of valid substrings from some start to the end of the path using the fixed length of 2 for each chunk..
  • joins above to itself to get paths which dont go to the end of the path
  • takes substring on x.path using above computed substring indexes
  • aggregates sum over above x.path subsequences

setup

create table x
(
  path varchar(23) primary key not null,
  count integer not null
);

insert into x
( path, count )
values
( ':A1:B2:C1:D1:G1' , 3 ),
( ':A1:B2:C1:D1:G4' , 1 ),
( ':A2:B1:C2:D2:G4' , 2 )
;

drop view if exists digits_v;
create view digits_v
as
select 0 as n
union all
select 1 union all select 2 union all select 3 union all 
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
;

query

select substring(x.path, `start`, `len`) as chunk, sum(x.count)
from x
cross join
(
  select o1.`start`, o2.`len`
  from
  (
    select 1 + 3 * seq.n as `start`, 15 - 3 * seq.n as `len`
    from digits_v seq
    where 1 + 3 * seq.n between 1 and 15
    and   15 - 3 * seq.n  between 1 and 15
  ) o1
  inner join
  (
    select 1 + 3 * seq.n as `start`, 15 - 3 * seq.n as `len`
    from digits_v seq
    where 1 + 3 * seq.n between 1 and 15
    and   15 - 3 * seq.n  between 1 and 15
  ) o2
  on  o2.`start` >= o1.`start` 
) splices
where substring(x.path, `start`, `len`) <> ''
group by substring(x.path, `start`, `len`)
order by length(substring(x.path, `start`, `len`)), substring(x.path, `start`, `len`)
;

output

+-----------------+--------------+
|      chunk      | sum(x.count) |
+-----------------+--------------+
| :A1             |            4 |
| :A2             |            3 |
| :A3             |            3 |
| ...             |          ... |
| :A1:B2          |            4 |
| :A2:B1          |            3 |
| :A3:B3          |            2 |
| :A3:B4          |            1 |
| ...             |          ... |
| :A1:B2:C1       |            4 |
| :A2:B1:C2       |            2 |
| :A2:B1:D2       |            3 |
| :A3:B3:C4       |            2 |
| :A3:B4:C2       |            1 |
| ...             |          ... |
| :A1:B2:C1:D1    |            4 |
| :A2:B1:C2:D2    |            2 |
| :A3:B3:C4:D3    |            2 |
| :A3:B4:C2:D3    |            1 |
| ...             |          ... |
| :A1:B2:C1:D1:G1 |            3 |
| :A1:B2:C1:D1:G4 |            1 |
| :A2:B1:C2:D2:G4 |            2 |
| :A3:B3:C4:D3:G7 |            2 |
| :A3:B4:C2:D3:G7 |            1 |
+-----------------+--------------+

sqlfiddle

amdixon
  • 3,814
  • 8
  • 25
  • 34
  • Yes, the output is exactly what I need. Thankyou so much! I'll learn your queries to understand the functions. – Jess Moi Jan 03 '16 at 16:54
  • main thing is to understand the sequence generator and the valid substrings ( from the fixed length 2 nodes ). if the nodes become variable length, the complexity will increase even more ;) – amdixon Jan 03 '16 at 16:55