1

I'm a user of Oracle BI (v. 11.1.1.7.141014). I have a text column "description" and would like to create a new table with the word count for all words in that column. So for instance:

Source:

Description
___________
This is a test
Just a test

Result:

Word    Count
_____________
a       2
test    2
is      1
just    1
this    1

Would it be possible? I have a user account, (no administration features), but I can work on reports (tables, pivot tables, etc.), data structures, custom SQL queries (limited to reports and data structures) and so on...

Thanks in advance

tripuz
  • 11
  • 2
  • You need a parser and some rules. It's tricky. Is a word like "it's" one word or two? – kevinskio Mar 31 '16 at 16:08
  • I think I would filter out stop words such as that one in some way; by the way, the column is in italian language, so no english abbreviations (maybe italian ones...). – tripuz Mar 31 '16 at 16:10
  • Use space as separator. Check this one. http://stackoverflow.com/a/30044861/3470178 – Juan Carlos Oropeza Mar 31 '16 at 16:12
  • Are "this" and "This" the same or different? –  Mar 31 '16 at 16:43
  • The same (no case sensitive filtering) – tripuz Mar 31 '16 at 16:44
  • @JuanCarlosOropeza I tried the solution from the other post but I'm having some trouble customizing it in my report (lots of SQL errors), will try again tomorrow. – tripuz Mar 31 '16 at 16:54
  • @tripuz I didnt want to write the code because doesnt have one oracle BI instance to try. But search for `oracle split string into row` maybe you find a better answer. – Juan Carlos Oropeza Mar 31 '16 at 16:56
  • Note - It is best to avoid using reserved words (like COUNT) as column names, this can only cause trouble. You can name REPORT columns whatever you like, but not table columns in the database. In my solution I called the column "word_count". –  Mar 31 '16 at 17:38

1 Answers1

1

Defining "word" as any sequence of one or more consecutive English letters (upper or lower case), and assuming that "this" and "This" are the same, here is one possible solution. The first line of the code ends in "... from a)," substitute your table name in place of "a" (for my own testing purposes, I created a table with your input data and I called it a).

with b (d, ct) as (select Description, regexp_count(Description, '[a-zA-Z]+') from a),
     h (pos)       as (select level from dual connect by level <= 100),
     prep (word)   as (select lower(regexp_substr(d, '[a-zA-Z]+', 1, pos)) from b, h where pos <= ct)
select   word, count(word) as word_count 
from     prep 
group by word
order by word_count desc, word
/

The solution needs to know beforehand the maximum number of words per input string; I used 100, that can be increased (in the definition of h in the second line of code).

  • Thank you, I tried this solution by pasting it in my analysis in the custom SQL query field, substituting my table name in place of "a" and the column name in place of "Description". Unfortunately, when I try to run the query in my report, OBIEE says there's an error "unresolved table 'prep' in {call NQSGetQueryColumnInfo('SELECT word FROM prep')"...maybe I can't define new temporary tables in OBIEE in this way? – tripuz Apr 01 '16 at 08:27