-2

I have a table with a column that has urls. I want to query out a particular url param value from each record. the url param can occur in any position in the url data and the url can contain hashbangs and this param can contain special chars like -, _ and |.

data table column:

url

http://www.url.com?like=hobby&name=tom-_green

http://www.url.com?name=bob|ghost&like=hobby

and I want the query results to be

name

srini

tom-_green

bob|ghost

I tried a query like

Select regexp_extract(url, '(?<=name=)[^&?]*(?:|$&)',2) as name From table_name

I see java exceptions when I run this query. the exceptions are pretty vague and checking if someone can help.

Community
  • 1
  • 1
  • Possible duplicate of [Extract parameter value from url using regular expressions](http://stackoverflow.com/questions/1280557/extract-parameter-value-from-url-using-regular-expressions) – Prune Oct 16 '15 at 17:24
  • See similar questions http://stackoverflow.com/questions/1280557/extract-parameter-value-from-url-using-regular-expressions and http://stackoverflow.com/questions/25586792/extracting-a-url-parameter-value-in-javascript; I think they cover most -- if not all -- of what you need – Prune Oct 16 '15 at 17:25
  • hi @Prune I was looking for the query for hadoop and not javascript :) I found the answer.. but thanks for the help! – sriiniivas Oct 16 '15 at 23:01
  • Right -- but regexp is very similar from one language to another, variations on the UNIX original. I'm glad you got what you needed. – Prune Oct 16 '15 at 23:09

1 Answers1

-1

I found another Hive implementation for handling URLs specifically..

Select parse_url(url, 'QUERY', 'name') as name From table_name and this worked :)

ref: parse_url(string urlString, string partToExtract [, string keyToExtract])

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF