SQL: how to query a set of data and count how many matched string in a given string list

Question

I build a simple question-answer system.

In my database, there are three tables:

question (
  id         int
  question varchar(200)
  answer_id  int  /* foreign key mapping to answer.id */
);

answer (
  id  int
  answer    varchar(500)
)

question_elements (
    id    int
    seq   int    /*vocabulary in question location */
    question_id    int  /** foreign key mapping to question.id */
    vocabulary  varchar(40)
)

Now I have a question:

What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ?

So in table question, a record is:

question {
  id: 1,
  question:"What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ?",
  answer_id:1
}

In table question_elements

question_elements [
  {
    id: 1,
    seq: 1,
    question_id: 1,
    vocabulary: "what"
  },
  {
    id: 2,
    seq: 2,
    question_id: 1,
    vocabulary: "approach"
  },
  {
    id: 3,
    seq: 3,
    question_id: 1,
    vocabulary: "should"
  },
  {
    id: 4,
    seq: 4,
    question_id: 1,
    vocabulary: "a"
  },
  {
    id: 5,
    seq: 5,
    question_id: 1,
    vocabulary: "company"
  },
  {
    id: 6,
    seq: 6,
    question_id: 1,
    vocabulary: "adopt"
  },
  {
    id: 7,
    seq: 7,
    question_id: 1,
    vocabulary: "when"
  },
  ....
  ....
  {
    id: 19,
    seq: 19,
    question_id: 1,
    vocabulary: "get"
  },
  {
    id: 20,
    seq: 20,
    question_id: 1,
    vocabulary: "funding"
  }
]

Now, when a user inputs:

What action does a company should do when it wanna get more funding with high debt ratio

My idea is to split the above statement into a string list, and execute a SQL query in order to count the matched string in table question_elements by giving the above string list.

What is the SQL statement in PostgreSQL ?

are you using json fields or that is your way of show us the data? — Juan Carlos Oropeza, Oct 29 '15 at 01:46
Looks like you have two question. One is perform split by `" "` and the other is see how many matchs. — Juan Carlos Oropeza, Oct 29 '15 at 01:48
Possible duplicate of [Split column into multiple rows in Postgres](http://stackoverflow.com/questions/29419993/split-column-into-multiple-rows-in-postgres) — Juan Carlos Oropeza, Oct 29 '15 at 01:49
Juan Carlos Oropeza, the json fields are just to show you the data. And spliting by " " is already done. so the last problem is to count how many matchs — wureka, Oct 29 '15 at 03:39
Adish, my idea is, let's say, there are now 10 question-answer pairs in database. A user inputs a question, and I split his/her question by " " into a list. In database, If I found that the third question's elements match count is the most, then I can say that the user could be asking the third question in database, so I can show him/her the third question's answer. Well, I know this logic may not be the best. But the first check point is that I need to know the match count for each question in the database. So I need to know how to build the SQL. — wureka, Oct 29 '15 at 03:45
@wureka - In MS SQL I would do it like this: Step 1 Make a subquery to split the user input on space. Step 2 Join this subquery to question_element table on question_element.vocabulary and group by questionid and get a sum of the matches and pick the top 1 after putting them in desc order. I can send you a script that works in MS SQL if it helps. — Adish, Oct 29 '15 at 04:24
@wureka instead of loading json why not use normal table format? or even better create a http://sqlfiddle.com/ that way we can work with real data and give you a working answer. — Juan Carlos Oropeza, Oct 29 '15 at 05:50

Tiago Oliveira de Freitas · Answer 1 · 2015-10-29T10:57:46.873

If I understand well, you want something like this:

WITH answer AS (
    SELECT 
        'What action does a company should do when it wanna get more funding' AS a
),
question AS (
    SELECT 'what' AS q
    UNION ALL SELECT 'should'
    UNION ALL SELECT 'a'
    UNION ALL SELECT 'company'
    UNION ALL SELECT 'do'
    UNION ALL SELECT 'when'
)
SELECT COUNT(result)
FROM (
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result
    FROM answer
) AS tbaux
WHERE result IN (select CAST(q AS VARCHAR) FROM question);

Without Text Capitalization, and some explanations:

SELECT COUNT(result)
FROM (                                                 --count how many lines have in the subquery
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result        --this break the user input in one word per line, excluding ' '
    FROM answer
) AS tbaux                                                                  --name of the subquery
WHERE upper(result) IN (select upper(CAST(q AS VARCHAR)) FROM question);    --upper turns lowercase letters in uppercase, only the line who match will remain to the COUNT()

This counts how many words from the user input is in the question table (in your case question_elements)

http://sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/4095/0

score 0 · Answer 2 · answered Oct 29 '15 at 15:04

The question_elements table is not necessary.

with ui(ui) as (
    values ('What action does a company should do when it wanna get more funding with high debt ratio')
)
select id, count(*) as matches, question
from
    (
        select id, question, regexp_split_to_table(question, '\s+') as word
        from question
    ) q
    inner join
    regexp_split_to_table((select ui from ui), '\s+') ui(word) using (word)
group by 1, 3
order by matches desc

SQL: how to query a set of data and count how many matched string in a given string list

2 Answers2