2

I'm trying to get a count on how many times BNxxxx has been commented in the comments cell. So far, I can make each cell be counted once, but there may be multiple comments in a cell containing BNxxxx.

For example, this:

------- 
BN0012
------- 
BN0012
------- 
BN0012
BN0123
-------

should show an output of BN0012 3 times and BN0123 once. Instead, I get BN0012 3 times only.

Here's my code:

select COMMENTS, count(*) as TOTAL 
from NOTE 
Where COMMENTS like '%BN%' AND CREATE_DATE between '01/1/2015' AND '11/03/2015'
group by COMMENTS
order by Total desc; 

Any ideas?

edit

My code now looks like

select BRIDGE_NO, count(*)
from IACD_ASSET b join 
IACD_NOTE c
on c.COMMENTS like concat(concat('BN',b.BRIDGE_NO),'%')
Where c.CREATE_DATE between '01/1/2015' AND '11/03/2015' AND length(b.BRIDGE_NO) > 1 
group by b.BRIDGE_NO
order by count(*);

Problem with this is the BN44 is the same as BN4455 .. have tried concat(concat('BN',b.BRIDGE_NO),'_') comes back with nothing , any ideas how i can get exact likes

Aldwoni
  • 1,168
  • 10
  • 24
cain2060
  • 25
  • 4

3 Answers3

3

You have a problem. Let me assume that you have a table of all known BN values that you care about. Then you can do something like:

select bn.fullbn, count(*)
from tableBN bn join
     comments c
     on c.comment like ('%' || bn.fullbn || '%')
group by bn.fullbn;

The performance of this might be quite slow.

If you happen to be storing lists of things in the comment field, then this is a very bad idea. You should not store lists in strings; you should use a junction table.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I think this will be the solution, just getting some oracle syntax errors, which i assume is to do with how the join is being performed or the concat command – cain2060 Mar 11 '15 at 02:32
  • This is working, with , instead of pipes.. With the like statement, i want comments that have "BN" before the number, as the BN table only has numbers. i tried ('BN',b.BRIDGE_NO, '%'), dosnt seem to give the desired result. any ideas – cain2060 Mar 11 '15 at 03:37
  • Use ('%BN' || bn.fullbn || '%') – psaraj12 Mar 11 '15 at 04:37
  • hey psarag12, thank for that, did try that the problem with this is the BN44 is the same as BN4455 – cain2060 Mar 11 '15 at 05:02
  • @cain2060, that is why I would suggest using regexes instead of simple `LIKE`. `LIKE` can't be made to be greedy, so it won't choose to match `BN4455` over `BN44`. – David Faber Mar 11 '15 at 13:20
1

I'm going to assume that your COMMENTS table has a primary key column (such as comment_id) or at least that comments isn't a CLOB. If it is a CLOB then you're not going to be able to use GROUP BY on that column.

You can accomplish this as follows without even a lookup table of BN.... values. No guarantees as to the performance:

WITH d1 AS (
    SELECT 1 AS comment_id, 'BN0123 is a terrible thing BN0121 also BN0000' AS comments
         , date'2015-01-03' AS create_date
      FROM dual
     UNION ALL
    SELECT 2 AS comment_id, 'BN0125 is a terrible thing BN0120 also BN1000' AS comments
         , date'2015-02-03' AS create_date
      FROM dual
)
SELECT comment_id, comments, COUNT(*) AS total FROM (
    SELECT comment_id, comments, TRIM(REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i')) AS bn
      FROM d1
     WHERE create_date >= date'2015-01-01'
       AND create_date < date'2015-11-04'
   CONNECT BY REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i') IS NOT NULL
       AND PRIOR comment_id = comment_id
       AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL
) GROUP BY comment_id, comments;

Note that I corrected your filter:

CREATE_DATE between '01/1/2015' AND '11/03/2015'

First, you should be using ANSI date literals (e.g., date'2015-01-01'); second, using BETWEEN for dates is often a bad idea as Oracle DATE values contain a time portion. So this should be rewritten as:

    create_date >= date'2015-01-01'
AND create_date < date'2015-11-04'

Note that the later date is November 4, to make sure we capture all possible comments that were made on November 3.

If you want to see the matched comments without aggregating the counts, then do the following (taking out the outer query, basically):

WITH d1 AS (
    SELECT 1 AS comment_id, 'BN0123 is a terrible thing BN0121 also BN0000' AS comments
         , date'2015-01-03' AS create_date
      FROM dual
     UNION ALL
    SELECT 2 AS comment_id, 'BN0125 is a terrible thing BN0120 also BN1000' AS comments
         , date'2015-02-03' AS create_date
      FROM dual
)
SELECT comment_id, comments, TRIM(REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i')) AS bn
  FROM d1
 WHERE create_date >= date'2015-01-01'
   AND create_date < date'2015-11-04'
CONNECT BY REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i') IS NOT NULL
   AND PRIOR comment_id = comment_id
   AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL;

Given the edits to your question, I think you want something like the following:

SELECT b.bridge_no, COUNT(*) AS comment_cnt
  FROM iacd_asset b INNER JOIN iacd_note c
    ON REGEXP_LIKE(c.comments, '(^|\W)BN' || b.bridge_no || '(\W|$)', 'i')
 WHERE c.create_dt >= date'2015-01-01'
   AND c.create_dt < date'2015-03-12' -- It just struck me that your dates are dd/mm/yyyy
   AND length(b.bridge_no) > 1
 GROUP BY b.bridge_no
 ORDER BY comment_cnt;

Note that I am using \W in the regex above instead of \s as I did earlier to make sure that it captures things like BN1234/BN6547.

David Faber
  • 12,277
  • 2
  • 29
  • 40
  • Very interesting! I had never seen `connect by regex_substr` used like this. For anyone interested in more info this pattern and how it works you can check out alfasin's answer here http://stackoverflow.com/questions/19195280/connect-by-clause-in-regex-substr – BateTech Mar 11 '15 at 11:24
  • Why is the `AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL` needed? – BateTech Mar 11 '15 at 11:28
  • If you omit it, you get a cartesian join. I don't really understand why, personally, but it works. – David Faber Mar 11 '15 at 13:11
-1

Try use the distinct keyword in your select statement, to pull in unique values for the comments. Like this:

select distinct COMMENTS, count(*) as TOTAL 
from NOTE 
Where COMMENTS like '%BN%' AND CREATE_DATE between '01/1/2015' AND    
'11/03/2015'
group by COMMENTS
order by Total desc; 
AdamP.
  • 137
  • 4