0

I get stuck by counting hashtags with HiveQL. My problem: I have these format of the hashtags in one row:


jurassicworld;movie;night;dino

jurassicWorld;book;yourtickets;movie

jurassicWorld;movie


I looked at the https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF from Hive, but there is no function, which i can choose a delimiter (;) to seperat these hashtags and count them.

my result should be look like this:

+---------------+-----------+
| Hashtag       | Count     |
+---------------+-----------+
| jurassicworld | 300       |
| movie         | 200       |
| night         | 100       |
| dino          | 250       | 
| book          | 50        |  
| etc...        | 100       |
+---------------+-----------+
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Daniel
  • 552
  • 2
  • 9
  • 29
  • possible duplicate of [Hive: converting a comma separated string to array for table generating function](http://stackoverflow.com/questions/13754648/hive-converting-a-comma-separated-string-to-array-for-table-generating-function) http://stackoverflow.com/questions/12160304/hadoop-hive-split-a-single-row-into-multiple-rows – o-90 Jun 19 '15 at 22:15
  • http://stackoverflow.com/questions/12160304/hadoop-hive-split-a-single-row-into-multiple-rows – o-90 Jun 19 '15 at 22:17

1 Answers1

0

I have created following dummy table deli -

hive> describe deli;
OK
row1                    string                  None

I used following query -

select hashTag, count(*) as data from deli LATERAL VIEW explode(split(row1,'\\;')) t1 AS hashTag group by hashTag;

And, its giving me following result for your data-

book    1
dino    1
jurassicWorld   2
jurassicworld   1
movie   3
night   1
yourtickets     1
orNehPraka
  • 413
  • 2
  • 6
  • 14