0

I have a column in which is stored nothing but text separated by one space. There may be one to maybe 5 words in each field of the column. I need a query to return all the distinct words in that column.

Tried:

SELECT DISTINCT tags FROM documents ORDER BY tags

but does not work.

To Elaborate.

I have a column called tags. In it I may have the following entries:

Row 1 Red Green Blue Yellow
Row 2 Red Blue Orange
Row 3 Green Blue Brown

I want to select all the DISTINCT words in the entire column - all fields. It would return:

Red Green Blue Yellow Orange Brown

If I counted each it would return:

2 Red
2 Green
3 Blue
1 Yellow
1 Brown
1 Orange
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • *"does not work"* - Can you elaborate on that? What results are you getting now and what are the expected results? You also need to post the schema/values and relevance to php here. Edit: To which the php tag was removed in an edit. http://stackoverflow.com/revisions/43499498/2 – Funk Forty Niner Apr 19 '17 at 15:05
  • If DISTINCT didn't work, try adding GROUP BY – Funk Forty Niner Apr 19 '17 at 15:06
  • If each tag was on its own row in another table this would be the simplest thing in the world ... – Alex K. Apr 19 '17 at 15:13
  • By selecting DISTINCT on the column MYSQL will only return the distinct string of words for each field in the column. Not the DISTINCT individual word. – Lawrence P. Apr 19 '17 at 15:14
  • First splt strings into rows: http://stackoverflow.com/questions/17942508/sql-split-values-to-multiple-rows then you can group by and get counts now that you have normalized data. – xQbert May 02 '17 at 18:05

2 Answers2

0

To fix this I ended up creating a second table where all keywords where inserted on their own row each along with a record key that tied them back to the original record in the main data table. I then just have to SELECT DISTINCT to get all tags or I can SELECT DISTINCT with a WHERE clause specifying the original record to get the tags associated with a unique record. Much easier.

0

There is not a good solution for this. You can achieve this with JSON functions as of 5.6, I think, but it's a little tricky until 8.0, when mySQL added the JSON_TABLE function, which can convert json data to a table like object and perform selects on it, but how it will perform is dependent on your actual data. Here's a working example:

CREATE TABLE t(raw varchar(100));
INSERT INTO t (raw) VALUES ('this is a test');

You will need to strip the symbols (commas, periods, maybe others) from your text, then replace any white text with ",", then wrap the whole thing in [" and "] to json format it. I'm not going to give a full featured example, because you know better than I do what your data looks like, but something like this (in its simplest form):

SELECT CONCAT('["', REPLACE(raw, ' ', '","'), '"]') FROM t;

With JSON_TABLE, you can do something like this:

SELECT CONCAT('["', REPLACE(raw, ' ', '","'), '"]') INTO @delimited FROM t;

SELECT * 
FROM JSON_TABLE(
    @delimited,
    "$[*]"
    COLUMNS(Value varchar(50) PATH "$")
) d;

See this fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=7a86fcc77408ff5dfec7a805c6e4117a

At this point you have a table of the split words, and you can replace SELECT * with whatever counting query you want, probably SELECT Value, count(*) as vol. You will also need to use group_concat to handle multiple rows. Like this:

insert into t (raw) values ('this is also a test'), ('and you can test it');

select concat(
    '["', 
    replace(group_concat(raw SEPARATOR '","'), ' ', '","'), 
    '"]'
) into @delimited from t;

SELECT Value, count(*) as vol
FROM JSON_TABLE(
  @delimited,
  "$[*]"
  COLUMNS(Value varchar(50) PATH "$")
) d
GROUP BY Value ORDER BY count(*) DESC;

If you are running <8.0, you can still accomplish this, but it will take some hackiness, like generating an arbitrary list of numbers and constructing the paths dynamically from that.

Chris Strickland
  • 3,388
  • 1
  • 16
  • 18