How to use COUNT() to parse individual words in a nvarchar field?

Question

So my query:

SELECT Tags, COUNT(Tags) AS Listings
FROM Job
WHERE datepart(year, dateposted)=2013
GROUP BY Tags
ORDER BY Listings DESC

Outputs:

+----------------------+----------+
|         Tags         | Listings |
+----------------------+----------+
| java c++             |       41 |
| software development |       41 |
| java c++ c#          |       31 |
|                      |       25 |
| sysadmin             |       25 |
| see jd               |       24 |
| java c++ ood         |       23 |
| java                 |       23 |
+----------------------+----------+

I want it to come out like so:

+----------------------+----------+
|         Tags         | Listings |
+----------------------+----------+
| java                 |       118|
| c++                  |       95 |
| ood                  |       23 |
| see                  |       24 |
| jd                   |       24 |
| software development |       41 |
| sysadmin             |       25 |
| c#                   |       31 |
+----------------------+----------+

How can I count each individual word in the field instead of the entire field? The tags column is nvarchar.

Have you considered capturing each 'tag' as a separate row in a tags lookup table? — Matthew, Aug 29 '13 at 20:20
I would love to but the DB was created way before I got here and I can't change it. Read access only. — Juice, Aug 29 '13 at 20:26
I believe `Split()` generally has to be implemented as a user defined function. See: http://stackoverflow.com/questions/2647/split-string-in-sql (Next I suppose you'd want to sort the resulting values...) — , Aug 29 '13 at 20:27
How are you going to know that `software development` is one word and not two? — Taryn, Aug 29 '13 at 20:37
I don't know why my answer was down-voted if it works really great. I use it for comma-separated values. Hope it works for you too! — JGutierrezC, Aug 29 '13 at 20:47
Really?, then how are we supposed to know how are you treating the values that every tag already has? — Lamak, Aug 29 '13 at 20:49
Wow, define your question better. You can't be bothered to demonstrate meaningful results that are related in any way to your sample data? What a waste of people's time. — Aaron Bertrand, Aug 29 '13 at 20:55

Taryn · Answer 1 · 2013-08-29T21:08:28.737

First, your table structure is awful. Storing data in a list like that is going to cause you headaches similar to what you are trying to do right now.

The problem with a split function is you have no idea what software development or other multi-word tags are - Is that one word or two?

I think the only way you will solve this is by creating a table with your tags or using a derived table similar to the following:

;with cte (tag) as
(
  select 'java' union all
  select 'c++' union all
  select 'software development' union all
  select 'sysadmin' union all
  select 'ood' union all
  select 'jd' union all
  select 'see' union all
  select 'c#'
)
select c.tag, count(j.tags) listings
from cte c
inner join job j
  on j.tags like '%'+c.tag+'%'
group by c.tag

See SQL Fiddle with Demo. Using this you can get a result:

|                  TAG | LISTINGS |
|                 java |        9 |
|                  c++ |       10 |
| software development |        4 |
|             sysadmin |        2 |
|                  ood |        6 |
|                   jd |        3 |
|                  see |        2 |
|                   c# |        1 |

The issue with the above as was pointed out in the comments is how to decide if you have a tag software and development, those will match with the above query.

The best solution that you would have to this problem would be to store the tags in a separate table similar to:

create table tags
(
  tag_id int,
  tag_name varchar(50)
);

Then you could use a JOIN table to connect your jobs to the tag:

create table tag_job
(
  job_id int,
  tag_id int
);

Once you have a set up similar to this then it becomes much easier to query your data:

select t.tag_name,
  count(tj.tag_id) listings
from tags t
inner join tag_job tj
  on t.tag_id = tj.tag_id
group by t.tag_name

See demo

+1 for use of CTE, probably the best solution in this situation unless there is an existing Tag Lookup table. Like I explained in my post below, there is a potential issue if there are tags with values contained in other tags. (i.e. 'software' and 'software development') — Josh Jay, Aug 29 '13 at 21:00
@JoshJay The problem is the bad database design that the OP has. Storing data like this is an awful idea especially if you want to query it. — Taryn, Aug 29 '13 at 21:01
Agreed! Definitely the issue is bad design, I meant to say in light of the bad design this (cte) is the best solution unless there is an existing lookup table to reference in the query. — Josh Jay, Aug 29 '13 at 21:04
@JoshJay I updated my answer to include a possible better design for the OP. — Taryn, Aug 29 '13 at 21:09
FYI, I didn't do the design. Just trying to work with what was given me. — Juice, Aug 29 '13 at 21:22

Josh Jay · Answer 2 · 2013-08-29T20:48:01.743

0

You will probably need to split out the individual words.

Here's a good series on splitters in SQL Server: SqlServerCentral.com

I don't see how you will be able to differentiate "software development" as a single tag though. If you have a list of acceptable tags elsewhere, you could probably use that perform a count.

If you have a list of Available Tags, here is one approach that doesn't require a split.

Sql Fiddle Example

There could be an issue with this approach if you have a tag that is contained in another. I.e. 'software' and 'software development'

edited Aug 29 '13 at 20:48

answered Aug 29 '13 at 20:28

Josh Jay

1,240
2
14
26

The user must already be knowing that he needs to split the words, do you really think that it is an acceptable answer. – Sonam Aug 29 '13 at 20:38
@Sonam The OP did not state they need to split, which is why I included a link to a great series on splitting in SQL Server. I have also updated my answer to include another approach I thought of. I don't think there will be a great answer to this situation because the bad table design, but I'm just trying to provide ideas that might help. – Josh Jay Aug 29 '13 at 20:51

score 0 · Accepted Answer · answered Dec 04 '13 at 15:42

0

This is how I solved my issue.

SELECT TOP 50 Tags.s Tag, COUNT(Tags.s) AS Listings
FROM Job
CROSS APPLY [dbo].[SplitString](Tags,' ') Tags
WHERE NOT Job.Tags IS NULL and datepart(year,job.datecreated) = 2013
GROUP BY Tags.s
ORDER BY Listings DESC

answered Dec 04 '13 at 15:42

Juice

2,860
4
22
20

How to use COUNT() to parse individual words in a nvarchar field?

3 Answers3

Linked