Using UNION to combine same SQL Tables and searching for a specific word return duplicate rows

Question

I want to do the same[ on logic ] as this question -> SQL/mysql - Select distinct/UNIQUE but return all columns?

Using SELECT DISTINCT PATH FROM... instead of SELECT * FROM doesn't returns duplicates [ UNION works ] but it does return only one table column [PATH] , i want all the columns of the tables not only the Path column .

**Basically what i want to do is sql select multiple columns with only one distinct column ** the column PATH

Here is the SQLITE3 db file ->https://github.com/goxr3plus/Tester

Example sql statements for the db :

SELECT * FROM '_44057' UNION SELECT * FROM '_76468' UNION SELECT * FROM '_25912' UNION SELECT * FROM '_12603'  WHERE PATH LIKE '%e%';

SELECT * FROM '_44057' UNION SELECT * FROM '_76468' UNION SELECT * FROM '_25912' UNION SELECT * FROM '_12603'  WHERE PATH LIKE '%e%' GROUP BY PATH;

I am using SQLITE and i have 4 same tables with the same data and colums . So the tables contains the columns (PATH->this is primary key,STARS,...).

Actually here is the Table create statement :

"CREATE TABLE '" + dataBaseTableName + "'" 
                 + "(PATH       TEXT    PRIMARY KEY   NOT NULL ,"
                 + "STARS       DOUBLE     NOT NULL,"
                 + "TIMESPLAYED  INT     NOT NULL,"
                 + "DATE        TEXT    NOT NULL,"
                 + "HOUR        TEXT    NOT NULL)");

What i want to do is search for those items-songs that contain the specific word , for example a. So i am using UNION to collect all the tables data and finally a WHERE STATEMENT with a LIKE , finally i limit my results to 50. I also don't want dublicates but for some reason i am getting duplicate rows ...

The problems:

1)Dublicate rows are being returned

2)When tables contains a lot of elements it is getting slow

Small Explanation for code below:

Where you can see '%" + word + "%' , word is a String for example a
The below SQL Statement is dynamically generated , i mean that i can have more that 4 tables and each table can contain thousands of songs [ That makes the query very slow sometimes )

Simple SQL code:

SELECT * FROM '_44057'
 UNION 
SELECT * FROM '_76468' 
UNION 
SELECT * FROM '_25912' 
UNION 
SELECT * FROM '_12603' 
WHERE PATH LIKE '%" + word + "%' LIMIT 50

Finally:

Any recommendation , i mean anything from LIKE to Selector UNION replacing is appreciated a lot cause i am new to this :).

Image from Duplicate rows:

use `select distinct * from _60588,_40174,_15371,_64559 WHERE PATH LIKE '%" + word + "%' LIMIT 50` — XtremeBaumer, Apr 26 '17 at 14:22
@XtremeBaumer Thanks for recommentation , i tested it and it returns the same results as the image , again duplicate ... The PATH is the primary key :) — GOXR3PLUS, Apr 26 '17 at 14:26
@Berger Yes , ou now i saw that XtremeBaumer has changed it's comment i will test this .. Any recommendation is appreciated :) — GOXR3PLUS, Apr 26 '17 at 14:28
can you post the ddl of your tables?. probably you can use a join and group by Path — XtremeBaumer, Apr 26 '17 at 14:29
@XtremeBaumer Yes i have added it on the question , i also tried your updated comment so the produced code was `SELECT DISTINCT * FROM '_44057','_76468','_25912','_12603' WHERE PATH LIKE '%a%' LIMIT 50` but i get this error `org.sqlite.SQLiteException: [SQLITE_ERROR] SQL error or missing database (ambiguous column name: PATH)` — GOXR3PLUS, Apr 26 '17 at 14:35
this is because all tables have the column PATH. i am trying to make an query atm whcih works — XtremeBaumer, Apr 26 '17 at 14:37
@Berger Each table alone has no duplicates , when i combine them they do if you mean that :)... — GOXR3PLUS, Apr 26 '17 at 14:41
Don't use `select *` if you want to avoid duplicates or make sure of what is the desc of all tables and do show the result of all columns obtained from `select *` — Serge Ballesta, Apr 26 '17 at 14:48
if you would store the data in a i believe it was a `Set`, and use path as key, all duplicates would be eliminated (if you don't get a query to work) — XtremeBaumer, Apr 26 '17 at 14:49
@XtremeBaumer Yes but it is extra cost in memory , garbage collector and cpu [ Especially when i query thousands of songs ] . A native approach using SQL will be perfect ... . Let's assume that i use the SET and it finds 4 duplicates so the items get from 50 to 48 , then i query again [Overkill ] how many times that might happen . — GOXR3PLUS, Apr 26 '17 at 14:52
Replace UNION with UNION ALL. But for searching words, you would need [FTS](http://www.sqlite.org/fts3.html). — CL., Apr 26 '17 at 14:55
@CL. Hello CL. i have read the paper on your link . I am very curious to know why UNION is not removing duplicates , when it is supposed to do it . I mean all that UNION are returning a query , let's remove the WHERE statement . Why that query is returning duplicates at the first place ... i mean how i can eliminate them before entering the WHERE Statement :) — GOXR3PLUS, Apr 26 '17 at 15:07
@XtremeBaumer I think i found something . In order the UNION to work the duplicate rows must be completely the same ? For example here they must have the same PATH,STARS,DATE,TIME etc ? — GOXR3PLUS, Apr 26 '17 at 15:12
Can you make an [mcve] restricted to commandline tool sqlite3? Seems to me to be possible, since your question relates only to the SQL not to code around it. To make a good sqlite mcve create a toy database, filled with the right data to reproduce your problem. then do a `.dump` and quote the query showing the problem. — Yunnosch, Apr 26 '17 at 15:19
@Yunnosch Hello Yunnosch , i am trying to make it as minimal as possible . The database is dynamically generated , the expression also . I am using Java code for this , i see the duplicates though `System.out.println(...)` and actually through the application . I think i have added all the information on the question :) — GOXR3PLUS, Apr 26 '17 at 15:24
That is why I recommend to make a commandline-only mcve, with a specially created toy database, filled with suitable example data and a query to show your problem (and nothing else). You are able to dump the data which made your screenshot? You are able to find the database structure? Use it to make the toy. I think if you are able to do that, people will be able to help you find the problem. Actually I believe that you will find it yourself while making that mcve. Do not allow your GUI or your language API to hide the details of the problem from you. I think other comments mean the same. — Yunnosch, Apr 26 '17 at 15:29
@CL Would not switching from `UNION` to `UNION ALL` **increase** the duplicates? — Yunnosch, Apr 26 '17 at 15:33
@Yunnosch I found which the problem is but not a complete solution: Using `SELECT DISTINCT PATH FROM...` instead of `SELECT * FROM` doesn't returns duplicates [ `UNION` works ] but it does return only one table column [`PATH`] , i want all the columns of the tables not only the `Path` column . — GOXR3PLUS, Apr 26 '17 at 15:35
@XtremeBaumer I found which the problem is but not a complete solution: Using `SELECT DISTINCT PATH FROM...` instead of `SELECT * FROM` doesn't returns duplicates [ `UNION` works ] but it does return only one table column [`PATH`] , i want all the columns of the tables not only the `Path` column . — GOXR3PLUS, Apr 26 '17 at 15:36
@Yunnosch To me now it's clear **Basically what i want to do is `SQL Select All COLLUMNS Distinct only one COLUMN` the column PATH** :) — GOXR3PLUS, Apr 26 '17 at 15:44
I am pretty sure that I can write a query to satisfy you - if and when you provide an mcve. I could try to make that myself, but then my answer would be downvoted and the probability to achieve exactly what you want would be low. Please make an mcve. Pretty please. I so much want to help you and am vain enough to be convinced that I will be able to. Sniff. Sob. — Yunnosch, Apr 26 '17 at 16:42
At least post a few lines `insert into _60588 values ...; insert into _40174 values ...;` — Yunnosch, Apr 26 '17 at 16:46
@Yunnosch Updated the question and also uploaded a simple small dbFile.db of SQLITE3 ( Here is the SQLITE3 db file ->https://github.com/goxr3plus/Tester ) so you can test it with the code i have added in the Question . You will see that it returns duplicates for a or b letters . As for mcve , the structure of the table is defined on the question . Dunno what other to provide :) . Thanks for your effort !! Post whatever answer you want nobody is gonna downvote :) — GOXR3PLUS, Apr 26 '17 at 17:41
I admit that I now only would appreciate the convenience of a simple `.dump` and the fact that it would be within StackOverflow. But yes, I guess now all info is present. Let me try. — Yunnosch, Apr 26 '17 at 17:44
@Yunnosch I have updated the db File to produce even more duplicates . I didn't said also that inside the database some other tables exist also ( you can ignore them ) and the PATH represents absolute paths of song files on the user computer . — GOXR3PLUS, Apr 26 '17 at 17:56
@Yunnosch I am trying it online on https://sqliteonline.com/ , pretty awesome site :) — GOXR3PLUS, Apr 26 '17 at 18:22
@Yunnosch After multiple tests , the UNION would work like here http://stackoverflow.com/questions/14303573/join-two-different-tables-and-remove-duplicated-entries but the problem is that two rows can have the same PATH but different STARS or DATE or TIME hm.. — GOXR3PLUS, Apr 26 '17 at 18:58
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/142763/discussion-between-yunnosch-and-goxr3plus). — Yunnosch, Apr 26 '17 at 19:00
@XtremeBaumer Reading all comments, I just noticed that what I consider the main trick was mentioned by you - and earlier. Credits for that; though I thought of it myself, too. — Yunnosch, Apr 26 '17 at 20:25

score 1 · Accepted Answer · edited May 23 '17 at 11:55

This is the SQLite part of the solution, please wrap into java yourself:

SELECT * FROM (
  SELECT * FROM _12603 UNION ALL 
  SELECT * FROM _25912 UNION ALL 
  SELECT * FROM _44057 UNION ALL 
  SELECT * FROM _76468
) 
WHERE 
REPLACE(path, rtrim(path, replace(path, '\', '')), '') -- only filename
LIKE '%e%' GROUP BY path;

There is no explicit control on which line of identical paths is used.
OP has stated (in chat) that any one is fine, no need for special selection.
So just using group by is good enough.
However, comments are welcome on whether this (using group by without aggregate function) is clean/portable/reliable.

I tested with OPs mcve and
SQLite 3.18.0 2017-03-28 18:48:43 424a0d380332858ee55bdebc4af3789f74e70a2b3ba1cf29d84b9b4bcf3e2e37

OP tried on https://sqliteonline.com and was satisfied.

Just to mention: in the chat OP spotted an unneeded workaround (for some silliness of mine), which improved efficiency.

Using totally amazing answer by ungalcrys to
How to get the last index of a substring in SQLite?

@ungalcrys Just to make sure you get the "compliment by copying". — Yunnosch, Apr 26 '17 at 20:07
For anybody using this and needing speed: OP noticed a drop in speed due to the use of rtrim for searching only in filenames (not path). OP considers to introduce a filename-only column to get around that. — Yunnosch, Apr 26 '17 at 20:11
I will leave the bounty open some days in case some one improves the speed of the `REPLACE(path, rtrim(path, replace(path, '\', '')), '')` or generally your answer :) , then i will accept it . — GOXR3PLUS, Apr 29 '17 at 00:03

Using UNION to combine same SQL Tables and searching for a specific word return duplicate rows

1 Answers1