-1

I have a student table which looks something like this:

 id  |  name  |  school_descriptors
-------------------------------------------------------
 1   |  Rob   |  Comp Sci,Undergraduate,2020  
 2   |  Tim   |  Business,MBA,2022
 3   |  Matt  |  Business,MBA,2022
 4   |  Jack  |  Law,Masters,2024
 5   |  Steph |  Comp Sci,Masters,2022  

The school_descriptors field is just one column, and stores information about the Course, Qualification and Graduation year as a comma-delimited string. (it's terribly designed and I wish it could be split up into its own fields, but it can't right now (I am not the database owner))

I want to provide an interface where teachers can quickly find students that match certain Course, Qualifications and Graduation years, and thus would like to create relevant queries.

Question 1: For example, I would like a teacher to be able to select from the UI: "Business", "MBA" and get returned students with ID 2 and 3. Specifically, an example question I have is: Find students who are in the Business Course and doing the MBA qualification:

SELECT * FROM student_table WHERE school_descriptors LIKE '%Business%' AND school_descriptors LIKE '%MBA%'

The query I have in mind is a basic LIKE query, but I can't help but think there is a more efficient query that can take advantage of the fact that the school_descriptor string is 1) always in a specific order (e.g. course, qualification, graduation), and 2) comma-delimited, and thus could be perhaps split. The table currently sits at ~5000 rows so relatively small but is expected to grow.

Related question 2: Find students who are in the Comp Sci Course and graduating after 2019:

Would it be possible to split the school_descriptors field and add a >2019 operand?

Many thanks!

Rohan
  • 455
  • 1
  • 3
  • 11
  • 3
    "*I wish it could be split up into its own fields, but it can't right now :)*" Why not? This seems to be a pretty arbitrary requirement. This problem would be at least an order of magnitude easier to deal with if you did take the time to properly normalize this table. – esqew Sep 25 '20 at 17:57
  • 2
    See also [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad/3653574#3653574) – Bill Karwin Sep 25 '20 at 17:58
  • @Luuk the question's tag is MySql, not SQL Server. – forpas Sep 25 '20 at 18:01
  • Thanks for all the comments, unfortunately I am not the database owner and am creating the UI for a school who is a client which is why I am unable to change the table config. – Rohan Sep 25 '20 at 18:06
  • Function FIND_IN_SET searches string literal in CSV list (strict quiality). – Akina Sep 25 '20 at 18:07
  • "unfortunately I am not the database owner" Please find the database owner and tell to change it. Otherwise you will be the one who lives with nightmare. – Eric Sep 25 '20 at 21:45
  • Really appreciate all the feedback! After reading all your comments, my priority now is to get the database normalized, while using one of the substring_index queries as a very interim solution. – Rohan Sep 26 '20 at 00:24

3 Answers3

1
select id, name, 
  substring_index(school_descriptors,',',1) as course, 
  substring_index(substring(school_descriptors,length(substring_index(school_descriptors,',',1))+2,200),',',1) as Qualifications, 
  substring_index(school_descriptors,',',-1) as year
from student;

output:

+------+-------+----------+----------------+------+
| id   | name  | course   | Qualifications | year |
+------+-------+----------+----------------+------+
|    1 | Rob   | Comp Sci | Undergraduate  | 2020 |
|    2 | Tim   | Business | MBA            | 2022 |
|    3 | Matt  | Business | MBA            | 2022 |
|    4 | Jack  | Law      | Masters        | 2024 |
|    5 | Steph | Comp Sci | Masters        | 2022 |
+------+-------+----------+----------------+------+

A link to the docs, in case you want to know about SUBSTRING_INDEX()

Luuk
  • 12,245
  • 5
  • 22
  • 33
1

In MySql you can use the function SUBSTRING_INDEX() to split the column school_descriptors.
This will work only if the positions of Course, Qualification and Graduation year are fixed.

select *,
  substring_index(school_descriptors, ',', 1) Course, 
  substring_index(substring_index(school_descriptors, ',', 2), ',', -1) Qualification, 
  substring_index(school_descriptors, ',', -1) Graduation
from student_table 

See the demo.
Results:

> id | name  | school_descriptors          | Course   | Qualification | Graduation
> -: | :---- | :-------------------------- | :------- | :------------ | :---------
>  1 | Rob   | Comp Sci,Undergraduate,2020 | Comp Sci | Undergraduate | 2020      
>  2 | Tim   | Business,MBA,2022           | Business | MBA           | 2022      
>  3 | Matt  | Business,MBA,2022           | Business | MBA           | 2022      
>  4 | Jack  | Law,Masters,2024            | Law      | Masters       | 2024      
>  5 | Steph | Comp Sci,Masters,2022       | Comp Sci | Masters       | 2022    
forpas
  • 160,666
  • 10
  • 38
  • 76
1

Answer 1:

SELECT * FROM student_table WHERE school_descriptors REGEXP ['Business','MBA']

By using this query you can get all the records that are having Business OR MBA. If you want to select only Business, MBA you can try like this

SELECT * FROM student_table WHERE school_descriptors LIKE '%Business,MBA%'

Answer 2:

SELECT * 
FROM student 
WHERE
  SUBSTRING_INDEX(SUBSTRING_INDEX(school_descriptors , ',', 1), ',', -1)='Comp Sci' 
  AND
  SUBSTRING_INDEX(SUBSTRING_INDEX(school_descriptors , ',', 3), ',', -1)> 2019;
Slava Rozhnev
  • 9,510
  • 6
  • 23
  • 39
Murali
  • 35
  • 5