5

I have a text field which looks like:

option[A]sum[A]g3et[B]

I want to get the text which is inside the [ ] without duplicates. Meaning to get:

A
B

There can't be a case of double like [ [ ] ].

I know this is a horrible way to save data in databases. I can not change how the data is saved. I just need to get a very specific (one time) information from this column.

I tried to do:

SELECT substring_index(substring_index(sentence, '[', -1),']', 1)
FROM (SELECT 'THIS[A] IS A TEST' AS sentence) temp;

This gives me A, but it will not work for many [].

I thought of using regex however I don't know how many [ ] I have.

How do I do that?

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
m m
  • 103
  • 7

1 Answers1

6

It is not job for DB but it is possible:

CREATE TABLE tab(id INT, col VARCHAR(100));           
INSERT INTO tab(id, col) 
VALUES (1, 'option[A]sum[A]g3et[B]'), (2, '[Cosi]sum[A]g3et[ZZZZ]');      

SELECT DISTINCT *
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val
    FROM tab t 
    CROSS JOIN 
    (
     SELECT a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
ORDER BY ID;

SqlFiddleDemo

Note:

Depending on col maximum length you may need to generate more numbers in CROSS JOIN section. For now it is up to 100.

Output:

enter image description here

How it works:

  1. Generate number table with CROSS JOIN
  2. Split string based on ] as delimeter
  3. RIGHT(val, LENGTH(val) - LOCATE('[', val)) remove the part up to [
  4. filter out empty records
  5. Get only DISTINCT values

Inner most query:

╔════╦══════════╗
║ id ║   val    ║
╠════╬══════════╣
║  1 ║ option[A ║
║  1 ║ sum[A    ║
║  1 ║ g3et[B   ║
║  1 ║          ║
╚════╩══════════╝

Second subquery:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ A   ║
║  1 ║ B   ║
║  1 ║     ║
╚════╩═════╝

And outermost query:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ B   ║
╚════╩═════╝

I need the result of query per row.. not combined

So add simple:

WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  AND t.id = ?

EDIT 2:

see http://sqlfiddle.com/#!9/8ee95/1 your query works partially for my data. I also changed the type to longtext.

You want to parse JSON in MySQL. As I said before parse and get value in application layer. This answer is only for demo/toys purpose and will have very low performamce.

If you still insist on SQL solution:

SELECT id, val,s.n
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val,n
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val, n.n
    FROM (SELECT id, REPLACE(col, '[]','') as col FROM tab) t
    CROSS JOIN 
    (
     SELECT e.N * 10000 + d.N * 1000 + c.N * 100 + a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e

    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
GROUP BY id, val
HAVING n <> MAX(n)
ORDER BY id,n;

SqlFiddleDemo

Output:

╔═════╦═════════════╦════╗
║ id  ║    val      ║ n  ║
╠═════╬═════════════╬════╣
║  1  ║ CE31285LV4  ║  1 ║
║  1  ║ D32E        ║  3 ║
║  1  ║ GTX750      ║  5 ║
║  1  ║ M256S       ║  7 ║
║  1  ║ H2X1T       ║  9 ║
║  1  ║ FMLANE4U4   ║ 11 ║
╚═════╩═════════════╩════╝

EDIT 3:

What exactly is done there? Why do you need n

CROSS JOIN and entire subquery is only tally table. That'is all. If MySQL has function to generate number sequence (like generate_series or prepopulated number table there is no need for CROSS JOIN.

Number table is needed for SUBSTRING_INDEX:

SUBSTRING_INDEX(str,delim,count)

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.

Community
  • 1
  • 1
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • thank you. Can you explain how it works? I think you expanded my initiate intention. I need it to work on a specific string. Not on wide table... If it have 5 relevant rows in the table I will run the query 5 times. one time for each row.. I need the result of query per row.. not combined. The string datatype is `longtext` – m m May 04 '16 at 12:40
  • see http://sqlfiddle.com/#!9/8ee95/1 your query works partially for my data. I also changed the type to `longtext`. – m m May 04 '16 at 12:57
  • @mm Check updated. And for next time, please show your real data first, not oversimplified example :) – Lukasz Szozda May 04 '16 at 13:22
  • Thank you. Can you explain the cross join? What exactly is done there? Why do you need n ? and why it runs from 1 to 1000? does it means this support up to 1000 rows in result? If I read it correctly the cross join only create a sequence number series? – m m May 05 '16 at 04:40