2

For example i have the table called 'Table1'. and column called 'country'. I want to count the value of word in string.below is my data for column 'country':

country:
"japan singapore japan chinese chinese chinese"

expected output: in above data we can see the japan appear two time, singapore once and chinese 3 times.i want to count value of word where japan is count as one, singapore as one and chinese as one. hence the ouput will be 3. please help me

ValueOfWord: 3
Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
Haziq Azmi
  • 49
  • 1
  • 8
  • 2
    You shouldn't be storing multiple values in a single column. If you properly normalize your model, this is a very easy query. –  Nov 15 '15 at 07:26
  • Did you store that string in a single entry??? – Euclid Ye Nov 15 '15 at 07:28
  • yes. it is in single entry i want to count the value of every word in a single entry string – Haziq Azmi Nov 15 '15 at 07:45
  • Split the delimited string into rows, and then count the distinct. See [**Split single comma delimited string into rows in Oracle**](http://lalitkumarb.com/2014/12/02/split-comma-delimited-string-into-rows-in-oracle/) – Lalit Kumar B Nov 15 '15 at 08:22

3 Answers3

1

Firstly, it is a bad design to store multiple values in a single column as delimited string. You should consider normalizing the data as a permanent solution.

With the denormalized data, you could do it in a single SQL using REGEXP_SUBSTR:

SELECT COUNT(DISTINCT(regexp_substr(country, '[^ ]+', 1, LEVEL))) as "COUNT"
FROM table_name
  CONNECT BY LEVEL <= regexp_count(country, ' ')+1 
/

Demo:

SQL> WITH sample_data AS
  2    ( SELECT 'japan singapore japan chinese chinese chinese' str FROM dual
  3    )
  4  -- end of sample_data mocking real table
  5  SELECT COUNT(DISTINCT(regexp_substr(str, '[^ ]+', 1, LEVEL))) as "COUNT"
  6  FROM sample_data
  7    CONNECT BY LEVEL <= regexp_count(str, ' ')+1
  8  /

     COUNT
----------
         3

See Split single comma delimited string into rows in Oracle to understand how the query works.


UPDATE

For multiple delimited string rows you need to take care of the number of rows formed by the CONNECT BY clause.

See Split comma delimited strings in a table in Oracle for more ways of doing the same task.

Setup

Let's say you have a table with 3 rows like this:

SQL> CREATE TABLE t(country VARCHAR2(200));

Table created.

SQL> INSERT INTO t VALUES('japan singapore japan chinese chinese chinese');

1 row created.

SQL> INSERT INTO t VALUES('singapore indian malaysia');

1 row created.

SQL> INSERT INTO t VALUES('french french french');

1 row created.

SQL> COMMIT;

Commit complete.

SQL> SELECT * FROM t;

COUNTRY
---------------------------------------------------------------------------
japan singapore japan chinese chinese chinese
singapore indian malaysia
french french french
  • Using REGEXP_SUBSTR and REGEXP_COUNT:

We expect the output as 6 since there are 6 unique strings.

SQL> SELECT COUNT(DISTINCT(regexp_substr(t.country, '[^ ]+', 1, lines.column_value))) count
  2    FROM t,
  3      TABLE (CAST (MULTISET
  4      (SELECT LEVEL FROM dual
  5              CONNECT BY LEVEL <= regexp_count(t.country, ' ')+1
  6      ) AS sys.odciNumberList ) ) lines
  7    ORDER BY lines.column_value
  8  /

     COUNT
----------
         6

There are many other methods to achieve the desired output. Let's see how:

  • Using XMLTABLE
SQL> SELECT COUNT(DISTINCT(country)) COUNT
  2  FROM
  3    (SELECT trim(COLUMN_VALUE) country
  4    FROM t,
  5      xmltable(('"'
  6      || REPLACE(country, ' ', '","')
  7      || '"'))
  8    )
  9  /

     COUNT
----------
         6
  • Using MODEL clause
SQL> WITH
  2       model_param AS
  3       (
  4        SELECT country AS orig_str ,
  5               ' '
  6               || country
  7               || ' '                                 AS mod_str ,
  8               1                                      AS start_pos ,
  9              Length(country)                           AS end_pos ,
 10              (LENGTH(country) -
 11              LENGTH(REPLACE(country, ' '))) + 1        AS element_count ,
 12              0                                      AS element_no ,
 13              ROWNUM                                 AS rn
 14        FROM   t )
 15        SELECT COUNT(DISTINCT(Substr(mod_str, start_pos, end_pos-start_pos))) count
 16        FROM (
 17              SELECT *
 18              FROM   model_param
 19              MODEL PARTITION BY (rn, orig_str, mod_str)
 20              DIMENSION BY (element_no)
 21              MEASURES (start_pos, end_pos, element_count)
 22              RULES ITERATE (2000)
 23              UNTIL (ITERATION_NUMBER+1 = element_count[0])
 24            ( start_pos[ITERATION_NUMBER+1] =
 25                      instr(cv(mod_str), ' ', 1, cv(element_no)) + 1,
 26              end_pos[ITERATION_NUMBER+1] =
 27                      instr(cv(mod_str), ' ', 1, cv(element_no) + 1) )
 28            )
 29         WHERE    element_no != 0
 30    ORDER BY      mod_str , element_no
 31   /

     COUNT
----------
         6
Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
  • this query work fine but when apply in large amount of data it never show result – Haziq Azmi Nov 15 '15 at 09:33
  • your demo for a string in the sample_data but i already store data in database under column country : record 1="japan singapore japan chinese chinese chinese" record 2="singapore indian malaysia" i want it to count include record1 & record2 @Lalit Kumar B – Haziq Azmi Nov 15 '15 at 10:05
  • @HaziqAzmi I have provided a complete answer in the update. Please mark it as answered, would help others! – Lalit Kumar B Nov 15 '15 at 10:54
  • @HaziqAzmi On the top left of my answer, there is an up arrow to upvote the answer. Below it is a tick mark, just click on it to accept the answer, it would turn green in colour when you accept it. – Lalit Kumar B Nov 15 '15 at 12:18
0

Did you store that kind of string in a single entry?

If not, try

SELECT COUNT(*)
    FROM (SELECT DISTINCT T.country FROM Table1 T)

If yes, I would write an external program to parse the string and return the result you want.

Like using java.

Create a String set.

I would use JDBC to retrieve the record, and use split to split strings in tokens using ' 'delimiter. For every token, if it is not in the set, add it to the set.

When parse finishes, get the length of the set, which is the value you want.

Euclid Ye
  • 501
  • 5
  • 13
  • yes..the string in a single entry. it is like long string.. "japan singapore japan chinese chinese chinese" is represent as row 1 – Haziq Azmi Nov 15 '15 at 07:39
  • It is not easy query i think.. normally people count a certain word in string. for example: SELECT (LENGTH(QUESTION_ITEM) - LENGTH(REPLACE(QUESTION_ITEM,'EXPLAIN')))/LENGTH('EXPLAIN') FROM CLASSIFY – Haziq Azmi Nov 15 '15 at 07:48
  • please show me the external program to parse the string and return the result – Haziq Azmi Nov 15 '15 at 08:11
  • This doesn't answer OP's question. Remember, if you could it in `SQL` then do it in `SQL`, if not, do it in `PL/SQL`, if not do it in `JAVA`, if not do it in `C`. In OP's case it could be done in pure `SQL`. – Lalit Kumar B Nov 15 '15 at 11:19
0

Break the string based on the space delimiter

SELECT COUNT(DISTINCT regexp_substr(col, '[^ ]+', 1, LEVEL))
  FROM T
CONNECT BY LEVEL <= regexp_count(col, ' ')+1

For counting DISTINCT words

    SELECT col,
COUNT(DISTINCT regexp_substr(col, '[^ ]+', 1, LEVEL))
  FROM T
CONNECT BY LEVEL <= regexp_count(col, ' ')+1
GROUP BY col

FIDDLE

Mihai
  • 26,325
  • 7
  • 66
  • 81
  • 1
    This doesn't really answer OP's question. You are returning the count of each word, however, OP wants distinct count. Also, no need of `trim` as you only need the count. – Lalit Kumar B Nov 15 '15 at 08:18