2

I have some columns in my db where entries are in the following form:

["some text - more text & text","some other text - more text & text"]

At that time, having strings inside arrays made sense but I now want to update these columns where I have arrays with just strings to just get the text inside the zero-eth index of the array.

So in this case capturing:

some text - more text & text

Would be lovely. I have tried many regular expressions for omitting [ " or trying to capture contents inside of "" like "(.*?)\" but I get a syntax error in mySql query as it does not support lazy capturing.

I know the before regex would still select values like "some text - more text & text" but thought that at least I would be closer to the goal and then trimming the string as substring to omit the first and last index where the "" are. Any help?

GMB
  • 216,147
  • 25
  • 84
  • 135
devfrend
  • 33
  • 5
  • I f you reölace the [] at the fornt and end with '' and do something like https://stackoverflow.com/questions/57844393/mysql-field-data-parsing/57845202#57845202 , at least the part where you split up the row, you will get rows that you can insert into a new column – nbk Sep 14 '19 at 11:29
  • 1
    Are you storing multiple values in a single entry, or is this the result from multiple rows? The former I would suggest db normalization and not doing that too much in the future, the latter I'd suggest just a plain `REGEXP` call – Rogue Sep 14 '19 at 13:47
  • hi @Rogue, I was just storing multiple values in a single entry. – devfrend Sep 14 '19 at 19:43

3 Answers3

1

To get the first nested value, one solution would be to use REGEXP_REPLACE() to remove the unwanted parts of the string.

I would suggest the following regex: (^\\[)|(")|(,.*). Basically this will match on:

  • the opening bracket
  • OR the double quotes
  • OR everything after the first comma

Consider:

SELECT REGEXP_REPLACE(
    '["some text - more text & text","some other text - more text & text"]',
  '(^\\[)|(")|(,.*)',
  ''
 ) new_value;

Demo on DB Fiddle:

| new_value                    |
| ---------------------------- |
| some text - more text & text |
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you GMB this is a good answer, unfortunately I cannot use REGEXP_REPLACE in my version of mySQL (I run 5.6) and this was added in version 8 I think. But you got the idea of what I needed. Fortunately I got an answer from @nbk which helps me with the issue and that worked as well! – devfrend Sep 14 '19 at 19:19
1

Maybe i wasn't clear in my cooemnt. When take this Table

CREATE TABLE table1 
(`id` int , `txtarray`  varchar(200))
 ;

INSERT INTO table1
    (`id`  , `txtarray` )
VALUES
    (1, '["some text - more text & text","some other text - more text & text","some other1 text - more text & text"]'),
    (2, '["some1 text - more text & text","some other2 text - more text & text","some other3 text - more text & text"]')
;

and use following Select statement

SELECT
  REPLACE(SUBSTRING_INDEX(
    SUBSTRING_INDEX(p.txtarray, ',', N.n + 1)
    , ',', -1
   ),'"','') AS part
   ,p.ID
FROM (SELECT @n := @n + 1 AS n
   FROM INFORMATION_SCHEMA.COLUMNS AS a
   CROSS JOIN INFORMATION_SCHEMA.COLUMNS AS b
   CROSS JOIN (SELECT @n := -1) AS I
   WHERE @n < 1000) N 
  CROSS JOIN (Select id, REPLACE(REPLACE(txtarray,'[',''),']','') txtarray 
              From table1) p
    WHERE 
      N.n <= (LENGTH(p.txtarray) - LENGTH(REPLACE(p.txtarray, ',', ''));

You get following result.

part                                    ID
some text - more text & text            1
some1 text - more text & text           2
some other text - more text & text      1
some other2 text - more text & text     2
some other1 text - more text & text     1
some other3 text - more text & text     2

So you have every array element in one row. and you could insert that into a new table or column

nbk
  • 45,398
  • 8
  • 30
  • 47
  • Thank you so much nbk this worked like a charm. As I was typing this query I was no way this will work but good things happen and it ran and brought back exactly like in above. SAVED ME! @gmb's answer below is also a good one, getting the gist of what I needed, unfortunately I could not use REGEXP_REPLACE in my version of mySQL (5.6) – devfrend Sep 14 '19 at 19:17
0

As I understand your question, you just want to read the first string. Since your value looks like a valid JSON array, you can just use the following:

select str->'$[0]' from my_table;

This however requires MySQL 5.7 or above.

See demo

If you want to normalize your data in a new table - Consider the following script (Requires MySQL 8.0):

Assuming you have this data:

create table my_table (
  id int,
  str text
);

insert into my_table (id, str) values
  (1, '["some text - more text & text","some other text - more text & text"]'),
  (2, '["text 2-1","text 2-2","text 2-3"]');

Create a new normalized table:

create table my_normalized_table(
  id int,
  pos int,
  str text
);

And copy the data with:

insert into my_normalized_table (id, pos, str)
  with recursive seq as (
    select 0 as n
    union all
    select n + 1
    from seq
    where n < 99
  )
    select t.id, s.n, json_extract(t.str, replace('$[n]', 'n', s.n)) as str
    from seq s
    cross join my_table t
    having str is not null
;

Note: I'm assuming you have at most 100 strings in the array. If you can have more, you need to adjust where n < 99.

The new table will contain:

| id  | pos | str                                  |
| --- | --- | ------------------------------------ |
| 1   | 0   | "some text - more text & text"       |
| 2   | 0   | "text 2-1"                           |
| 1   | 1   | "some other text - more text & text" |
| 2   | 1   | "text 2-2"                           |
| 2   | 2   | "text 2-3"                           |

View on DB Fiddle

If your production server has an older MySQL version, you can still export the old table, import it on a MySQL 8.0 server, run the script, export the new table and import it in the production server.

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53