0

I have a large SQL database that contains all call records from a call center for the last 15 ish years. I am working with a subset of the records (3-5 million records). There is a field that is stored as text where we store all notes from the call, emails, etc. I would like to provide a list of keywords and have the program output another label in a new column for the record. Essentially classifying each record with the likely problem.

For example, my text record contains "Hi John, thank you for contacting us for support with your truck. Has the transmission always made this noise"

The query would then be something like

If the text record contains "Truck" and "Transmission" then the new column value is "error123".

I'm not sure if doing this in SQL would be feasible as there are almost 170 different errors that need to be matched. I was also thinking it maybe could be done in Python? I'm not sure what would be the best fit for this type of tagging.

Currently, I am using PowerQuery in PowerBI to load the SQL table, and then 170 switch statements to create a calculated column. This seems to handle about 500k records before timing out. While I can chunk my records, I know this isn't the best way, but I'm not sure what program would be most suited to it.

EDIT

Per the below answer, I am going to run an update command for each error on a new column. I only have read-only access to the Database, so I am using the below code to pull the data and add a new column called "Error". My problem is that I want to use the update command to update the new "Error" column instead of the DB. Is this possible? I know the update needs a table, what would the returned query table be called? Is it possible to do it this way?

SELECT *, 'null' AS Error FROM [TicketActivity]
UPDATE 
SET Error = 'desktop'
WHERE ActivityNote LIKE '%desktop%'
   AND ActivityNote LIKE '%setup%'
  • If you want to search for words in a string type column, then you're best off looking at full text indexes. – Thom A Nov 19 '20 at 14:59

2 Answers2

0

If you just need to check for keywords, I would not take the detour through Python since you need to transfer all information from the db into Python memory and back.

I would fire 170 different versions of this with UPDATE instead of SELECT and have columns available where you can enter a True or False (or copy probable records into another table using the same approach)

C Hecht
  • 932
  • 5
  • 14
  • So I followed along with that post, and I think this is what I need to do. I also learned they only granted Read-Only access to the tables, so I cannot update the actual tables. So what I have done is select the tables, and add a new column where the new column is called "Error". My problem now is using the Update command to update the newly added column in my results and not the table. I put the code in my main post. – Thomas DeWaters Nov 19 '20 at 16:59
0

So, I figured this out through some more Googling after being pointed in the right direction here.

SELECT *,
CASE
    WHEN column1 LIKE '%keyword%' 
    AND column1 LIKE '%keyword%' THEN 'Error 123'
    WHEN column1 LIKE '%keyword%' 
    AND column1 LIKE '%keyword%' THEN 'Error 321'
    ELSE 'No Code'
END AS ErrorMessage
FROM [TicketActivity]

Repeating the WHEN statements for as many as needed, and using a WHERE statement to select my time range