1

I am currently trying to overhaul the current keyword search I have put into the scripting software we use (the old one was vey basic and cumbersome) to something more refined. There are limitations to the software we use, and IT are tied up, so I do not have the option of creating a function or stored procedure which I appreciate would be the ideal solution.

The situation is that the end user might be using a generic script, when they should be using a specific script for their issue, so I want to create some SQL that detects if they should have gone to a different script using a keyword search.

I have a list of words associated with each script I feel they should be using, for example:

Repair script keywords: repair, broken, chasing

Estate script keywords: dirty, cleaning, garden

What I want to do is to make some SQL that assigns a numerical value of 1 to each instance of these words within the databox '{Script.Details01}', and then works out which set of keywords has the highest tally at the end.

This is what I have tired so far, I know it's likely not working due to the syntax. Sadly the software we are using is pretty vague when giving error messages so it's not much help. We are using aliases of V and D. D is the user display, so what they physically see, V is the value that the system reads and is not seen by the user. The databox which is where the string we are searching from is '{Script.Details01}'. As this information is stored within the software virtually we do not have to use the FROM field as we would do normally when referencing this location.

SELECT 'GO TO DIFFERENT SCRIPT' D, 'GO TO DIFFERENT SCRIPT' V,
        CASE WHEN EXISTS(SELECT '{Script.Details01}'  WHERE '{Script.Details01}' like '%repair%') THEN 1 ELSE 0 END +
        CASE WHEN EXISTS(SELECT '{Script.Details01}'  WHERE '{Script.Details01}' like '%broken%') THEN 1 ELSE 0 END +
        CASE WHEN EXISTS(SELECT '{Script.Details01}'  WHERE '{Script.Details01}' like '%chasing%') THEN 1 ELSE 0 END AS REP
        CASE WHEN EXISTS(SELECT '{Script.Details01}'  WHERE '{Script.Details01}' like '%dirty%') THEN 1 ELSE 0 END +
        CASE WHEN EXISTS(SELECT '{Script.Details01}'  WHERE '{Script.Details01}' like '%cleaning%') THEN 1 ELSE 0 END +
        CASE WHEN EXISTS(SELECT '{Script.Details01}'  WHERE '{Script.Details01}' like '%garden%') THEN 1 ELSE 0 END AS EST
WHERE REP = (SELECT MAX(REP)) AND REP <> 0 AND > EST
OR EST = (SELECT MAX(EST)) AND EST <> 0 AND > REP

Essentially what I'm looking for the code to do is to tell me if there is a higher tally for REP (repair) and EST (estate) or if there are no values registered against either. Apologies if I have not explained this well, there are a few restrictions within the software we are using so trying to explain it as best I can. Any ideas would be greatly appreciated.

  • Forgot to mention, information in that Details01 databox could be a string such as: "Caller wanting to complain about the repair they have chasing for days, as their boiler is broken". I would want this to pull through as Needing to go to the repair script. – Craig Jones Dec 30 '16 at 11:54
  • 1
    Please add appropriate tag for DBRMS and version. – user1429080 Dec 30 '16 at 12:06
  • Apologies, the software we are using is called Keyfax, but it is not a commonly used software, but it is referencing SQL server 2012, I have updated the tags now accordingly – Craig Jones Dec 30 '16 at 12:08

3 Answers3

2

Here is a first stab at solving this to some extent. It uses a CTE which might or might not be feasible in your case:

declare @inputText nvarchar(2000)
set @inputText = 'Caller wanting to complain about the repair they have chasing for days, as their boiler is garden broken and needs repair'

--------

declare @inputText nvarchar(2000)
set @inputText = 'Caller wanting to complain about the repair they have chasing for days, as their boiler is garden broken and needs repair'

;with SplitIntoWords(DataItem, Data) as (
    select cast(LEFT(@inputText, CHARINDEX(' ', @inputText + ' ') - 1) as nvarchar(2000)),
        cast(STUFF(@inputText, 1, CHARINDEX(' ', @inputText + ' '), '') as nvarchar(2000))
    union all
    select cast(LEFT(Data, CHARINDEX(' ', Data + ' ') - 1) as nvarchar(2000)),
        cast(STUFF(Data, 1, CHARINDEX(' ', Data + ' '), '') as nvarchar(2000))
    from SplitIntoWords
    where Data > ''
)
select  (
    select count(*) from SplitIntoWords where DataItem in ('repair','broken','chasing')
) as rep,
(
    select count(*) from SplitIntoWords where DataItem in ('dirty','cleaning','garden')
) as est,
(
    select count(*) from SplitIntoWords where DataItem not in ('dirty','cleaning','garden','repair','broken','chasing')
) as other

Note: The CTE is an adaptation of a CTE in this answer.

So what does it actually do? It uses the CTEto split the input text into individual words, then it searches for the given keywords and performs a count. Output from sample:

+-----+-----+-------+
| rep | est | other |
+-----+-----+-------+
|   4 |   1 |    16 |
+-----+-----+-------+

So it has found 1 keyword which belongs to the Estate group, 4 keywords from the Repair group, and 16 others.

To adapt the sample for your use case, replace @inputText with '{Script.Details01}'.

--- EDIT ---

Try this then:

;with SplitIntoWords(DataItem, Data) as (
    select cast(LEFT(@inputText, CHARINDEX(' ', @inputText + ' ') - 1) as nvarchar(2000)),
        cast(STUFF(@inputText, 1, CHARINDEX(' ', @inputText + ' '), '') as nvarchar(2000))
    union all
    select cast(LEFT(Data, CHARINDEX(' ', Data + ' ') - 1) as nvarchar(2000)),
        cast(STUFF(Data, 1, CHARINDEX(' ', Data + ' '), '') as nvarchar(2000))
    from SplitIntoWords
    where Data > ''
)
select top 1 scriptType, count(*) as typeCount
from (
    select case when DataItem in ('repair','broken','chasing') then 'rep' when DataItem in ('dirty','cleaning','garden') then 'est' else '' end as scriptType,
        DataItem
    from SplitIntoWords
) as sub
where scriptType != ''
group by scriptType
order by count(*) desc

Output:

+------------+-----------+
| scriptType | typeCount |
+------------+-----------+
| rep        |         4 |
+------------+-----------+

--- ONE MORE EDIT, Wildcard searching ---

Replace

    select case when DataItem in ('repair','broken','chasing') then 'rep' when DataItem in ('dirty','cleaning','garden') then 'est' else '' end as scriptType,

with

    select 
        case when 
            DataItem like '%repair%' 
            or DataItem like '%broken%'
            or DataItem like '%chasing%' then 'rep'
        when 
            DataItem like '%dirty&'
            or DataItem like '%cleaning%'
            or DataItem like '%garden%' then 'est'
        else '' end as scriptType,
Community
  • 1
  • 1
user1429080
  • 9,086
  • 4
  • 31
  • 54
  • thank you for taking the time to come up with this. I was not aware of CTE's before, I'm still relatively new to SQL, so learning all the time. This sounds like it is what I need. However one of the restrictions of the software is that it will not allow multiple rows to be displayed, so I will need it to only pull through the result with the highest count, excluding the total word count. I assume this would involve a SELECT TOP 1 somewhere in the code, but not sure where this would sit with this syntax? – Craig Jones Dec 30 '16 at 13:00
  • You are quickly becoming the hero of the day! Very close to the final result I'm looking for. I have deleted the last section, as I don't currently have use for a count on the other words, now I just need it to select which one has the highest count and only display this and which column it's in relation to. I will have a play about with it, as I am understanding a bit more about how this statement is constructed – Craig Jones Dec 30 '16 at 13:38
  • @CraigJones See one more edit. Note: If there are exactly as many `rep` words as `est` words, the final output might be any *one* of them. Also if the input string contains *no* keyword, the final result is that no row is returned. – user1429080 Dec 30 '16 at 13:48
  • That's great, I will have a play about with it from here. I just wondered if it would be possible to use wildcard searches within this syntax, as opposed to using the IN operator? Otherwise this works perfectly, thank you again for your time. – Craig Jones Dec 30 '16 at 13:56
2

I don't know if you can store the keywords in the database, but that would imho be preferable over hard coding. That way you could keep the maintenance of the keywords outside the function. The t-sql below works as is, but the keywords table can be ported to the db itself instead:

declare @keywords table(word varchar(50), type varchar(10)) --preferrably when stored in db, the type would be normalized to another table
insert into @keywords values
    ('repair', 'Rep'),
    ('broken', 'Rep'),
    ('chasing', 'Rep'),
    ('dirty', 'EST'),
    ('cleaning', 'EST'),
    ('garden', 'EST')

declare @teststring varchar(512) = 'When the film "chasing cars" was broken, we tried to repair it. It turned out it was dirty from lying in the garden, so we gave it a thorough cleaning' 

select top 1 k.type, COUNT(1) cnt from @keywords k where  @teststring like '%' + k.word + '%' group by k.type order by COUNT(1) desc

For each word inside the table, a like is performed (wildcards can be used inside the words as well). The group by checks the number of occurrences for each type and the order by and top 1 make sure you only get the one with most occurences (you could add an extra gravity to the types to sort on, in case both types have an equal number of occurences)

edit Since storing in the db isn't an option: the same can also be done without a table variable:

select top 1 k.type from (values
    ('repair', 'Rep'),
    ('broken', 'Rep'),
    ('chasing', 'Rep'),
    ('dirty', 'EST'),
    ('cleaning', 'EST'),
    ('garden', 'EST')
     ) k(word,type) 
 where  @teststring like '%' + k.word + '%' group by k.type order by COUNT(1) desc
Me.Name
  • 12,259
  • 3
  • 31
  • 48
  • Thanks for that. Sadly I don't have access to do this. Essentially my role is not quite IT, and in an ideal situation I would use something like this (which I know is much simpler and elegant!) however our IT resource is stretched so thin currently that things like this just won't get done, so trying to be proactive and find solutions. I appreciated the statement about chasing cars though, did make me laugh. – Craig Jones Dec 30 '16 at 14:03
  • The code can be used as is. Just wanted to mention the idea of porting it to the database. And couldn't resist making a sentence with all the words :p – Me.Name Dec 30 '16 at 14:05
  • You are correct! Didn't realise you could set it up as a temporary table in that way. SQL = 'ManyWaysToSkinACat'. Having a play about with this, just wondering if there was a way to get this to display only the Column heading with the highest count, ie just Rep or EST? Tried how I would normally do this and it seems to have effected the binding of K.Type and K.Word – Craig Jones Dec 30 '16 at 14:15
  • It should only display the highest count as it is now (the top 1 and order by count(1) desc). e.g. the above only returns `Rep 3` And if one of the Rep words is removed, the same returns `EST 3`, etc. Or do you mean you want the column header itself to be Rep/Est ? – Me.Name Dec 30 '16 at 14:19
  • I meant just the column header to display, so if EST = 2 and REP = 3 then I would want it to display just REP. Having the count in there is useful, and I will be exploring an additional use for that, just wanting to see if it's possible just to display REP – Craig Jones Dec 30 '16 at 14:27
  • This is much more elegant than my CTE version. It does not however as it stands count multiple occurrences of the same keyword. Just a note.. – user1429080 Dec 30 '16 at 14:34
  • In that case the `Count(1)` in the `select` part can simply be omitted (edited in the bottom example). You might want to get the result in a variable. In that case you can use something like `declare @type varchar(10) select top 1 @type = k.type from (values --etc...` – Me.Name Dec 30 '16 at 14:35
  • @user1429080 Good point. @ Craig, if the same word occurs multiple times, how should that be counted? – Me.Name Dec 30 '16 at 14:37
0

Try this, note that I have a sample text for every keywords. you need to replace these with the one you have:

DECLARE @repairtext AS varchar(max);
SET @repairtext = 'repair, broken, chasing, garden dirty, cleaning, garden'
DECLARE @repair int,@RepairCounter int
SET @repair= PATINDEX('%repair%',@repairtext)
IF @repair<>0  SET @Repaircounter=1
WHILE @repair<>0
BEGIN
SET @repairtext = STUFF(@repairtext,1,@repair+6,'')
SET @repair= PATINDEX('%repair%',@repairtext)
IF @repair<>0  SET @RepairCounter=@RepairCounter+1
END;

DECLARE @brokentext AS varchar(max);
SET @brokentext = 'repair, broken, chasing, garden dirty, cleaning, garden'
DECLARE @broken int,@BrokenCounter int
SET @broken= PATINDEX('%broken%',@brokentext)
IF @broken<>0  SET @BrokenCounter=1
WHILE @broken<>0
BEGIN
SET @brokentext = STUFF(@brokentext,1,@broken+6,'')
SET @broken= PATINDEX('%broken%',@brokentext)
IF @broken<>0  SET @BrokenCounter=@BrokenCounter+1
END;

DECLARE @chasingtext AS varchar(max);
SET @chasingtext = 'repair, broken, chasing, garden dirty, cleaning, garden'
DECLARE @chasing int,@chasingCounter int
SET @chasing= PATINDEX('%chasing%',@chasingtext)
IF @chasing<>0  SET @chasingCounter=1
WHILE @chasing<>0
BEGIN
SET @chasingtext = STUFF(@chasingtext,1,@chasing+7,'')
SET @chasing= PATINDEX('%chasing%',@chasingtext)
IF @chasing<>0  SET @chasingCounter=@chasingCounter+1
END;

DECLARE @dirtytext AS varchar(max);
SET @dirtytext ='repair, broken, chasing, garden dirty, cleaning, garden'
DECLARE @dirty int,@dirtyCounter int
SET @dirty= PATINDEX('%dirty%',@dirtytext)
IF @dirty<>0  SET @dirtyCounter=1
WHILE @dirty<>0
BEGIN
SET @dirtytext = STUFF(@dirtytext,1,@dirty+5,'')
SET @dirty= PATINDEX('%dirty%',@dirtytext)
IF @dirty<>0  SET @dirtyCounter=@dirtyCounter+1
END;

DECLARE @cleaningtext AS varchar(max);
SET @cleaningtext = 'repair, broken, chasing, garden dirty, cleaning, garden'
DECLARE @cleaning int,@cleaningCounter int
SET @cleaning= PATINDEX('%cleaning%',@cleaningtext)
IF @cleaning<>0  SET @cleaningCounter=1
WHILE @cleaning<>0
BEGIN
SET @cleaningtext = STUFF(@cleaningtext,1,@cleaning+8,'')
SET @cleaning= PATINDEX('%cleaning%',@cleaningtext)
IF @cleaning<>0  SET @cleaningCounter=@cleaningCounter+1
END;

DECLARE @gardentext AS varchar(max);
SET @gardentext = 'repair, broken, chasing, garden dirty, cleaning, garden'
DECLARE @garden int,@gardenCounter int
SET @garden= PATINDEX('%garden%',@gardentext)
IF @garden<>0  SET @gardenCounter=1
WHILE @garden<>0
BEGIN
SET @gardentext = STUFF(@gardentext,1,@garden+6,'')
SET @garden= PATINDEX('%garden%',@gardentext)
IF @garden<>0  SET @gardenCounter=@gardenCounter+1
END;

DECLARE @REP int = @RepairCounter + @BrokenCounter + @chasingCounter
,@EST int = @dirtyCounter + @cleaningCounter + @gardenCounter;

IF @REP > @EST
SELECT @REP AS REP
ELSE IF  @REP < @EST
 SELECT @EST AS EST
 ELSE SELECT @REP AS REP;