1

I have a data table like this. A more close depiction of data.

col0  |  col1 |  col2  |  col3  | col4
======================================
milk  | egg   | juice  |        |
egg   | juice |        |        |
bread | jam   | juice  |        |
wheat | egg   | juice  |        |
jam   | juice |        |        |

I want to first search for Juice in all columns and fetch all rows where juice is found. Then I have to fetch the rows with most unique and complete data. In this case it is the first row, since it has the most unique and complete data set with no repeating words. So, from this data if I search for 'juice', then i want this result

 milk  | egg   | juice  |
 bread | jam   | juice  |
 wheat | egg   | juice  |

I have tried to GROUP BY and DISTINCT on all column combinations, but i couldn't get desired result. I used this statement

SELECT * FROM `table` 
WHERE `col4` = 'juice' OR`col3` = 'juice' OR `col2` = 'juice' OR `col1` = 'juice' 
group by `col2`,`col1`,`col0`;

I would prefer it to be in one statement instead of separate ones too.

user2649343
  • 147
  • 3
  • 11
  • What if there are more than one row with same count of most unique values ? How would you decide the tie-break in that case ? – Madhur Bhaiya Oct 04 '19 at 15:54
  • This might be easier to do if you stored your column level data in rows instead. https://stackoverflow.com/questions/8868395/sql-multi-valued-attributes – bassxzero Oct 04 '19 at 15:56
  • @MadhurBhaiya the actual data that i am working with does not allow that. The data would be in similar pattern to the shown data. – user2649343 Oct 04 '19 at 15:56
  • @user2649343 can there be duplicate values across multiple columns ? Or all the values will be unique only ? Also, when there is no value, it is represented as an empty string ? or `NULL` ? – Madhur Bhaiya Oct 04 '19 at 15:59
  • This kind of problem is indicative of poor design. Consider revising your schema. – Strawberry Oct 04 '19 at 16:13
  • @MadhurBhaiya i have edited the data set to depict more accurately the actual data. – user2649343 Oct 04 '19 at 16:14
  • .@Strawberry is right; this is a bad schema; ideally you should have these similar columns (attributes) in different rows. It is a Multi-column Antipattern; and that is why we have to invent weird queries to handle bad schema. – Madhur Bhaiya Oct 04 '19 at 16:19
  • Spreading an array across columns is asking for trouble. You found one example of why. Have a table that lists one food per row. – Rick James Oct 13 '19 at 19:05

3 Answers3

0

You can use the GROUP_CONCAT function.

It returns a string with concatenated non-NULL value from a group.

user2342558
  • 5,567
  • 5
  • 33
  • 54
0

If you always have "most complete data" in higher column number (col4 > col3 and etc),

You can simply LIMIT result to 1:

SELECT * FROM `table` 
WHERE `col4` = 'juice' OR`col3` = 'juice' OR `col2` = 'juice' OR `col1` = 'juice'
LIMIT 1;
freeek
  • 985
  • 7
  • 22
0

If there are no duplicate values across multiple columns, and empty columns are represented using NULL, then we can utilize the implicit typecasting of boolean into int by MySQL:

SELECT * 
FROM your_table 
WHERE 'juice' IN (col1, col2, col3, col4) 
ORDER BY ( (col1 IS NOT NULL) + 
           (col2 IS NOT NULL) +
           (col3 IS NOT NULL) +
           (col4 IS NOT NULL) ) DESC
LIMIT 1

Result

| col0 | col1 | col2  | col3 | col4 |
| ---- | ---- | ----- | ---- | ---- |
| milk | egg  | juice |      |      |

View on DB Fiddle

Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57