0

I've tried to do the following operation with hive regexp_replace without success :

transforming :

/foo1/foo2/foo3-bar1/p-115390-20.html' or '/foo1/foo2/foo3-bar1/p-115390-35.html

into : /foo1/foo2/foo3-bar1/p-115390.html

knowing that I have urls like /foo1/foo2/foo3-bar1/p-115390.html which I don't want to be transformed : /foo1/foo2/foo3-bar1/p-115390.html stays /foo1/foo2/foo3-bar1/p-115390.html

I tried to use without success :

SELECT regexp_replace('/foo1/foo2/foo3-bar1/p-115390-20.html', 'p\-\d+(\-\d+\).html', '')

Numbers 115390 or 115390-20 can be any number of course.

  • possible duplicate of [How to do a regular expression replace in MySQL?](http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql) – lnrdo Feb 23 '15 at 18:21
  • Hi, this is a Hadoop question, hope that helps. No MySQL or other DBMS involved. Only Hadoop. Thanks –  Feb 24 '15 at 09:39
  • Sorry, don't know why I missed questions tags. – lnrdo Feb 24 '15 at 09:49
  • No prob, this is Hadoop with a Hive layer. Thanks –  Feb 24 '15 at 09:50

2 Answers2

0

Is this an ORACLE question? (You didn't tag your DBMS) You might try this

SELECT REGEXP_REPLACE(COLUMNNAME,'(\/foo1\/foo2\/foo3\-bar1\/p-\d+)\-\d+\.html','\1.html')

or

SELECT REGEXP_REPLACE('/foo1/foo2/foo3-bar1/p-115390-20.html','(\/foo1\/foo2\/foo3\-bar1\/p-\d+)\-\d+\.html','\1.html')

Finally, a shorter version of that which will possibly do just as well. It depends on how strict your matching needs to be.

SELECT REGEXP_REPLACE('/foo1/foo2/foo3-bar1/p-115390-20.html','(p-\d+)\-\d+\.html','\1.html')
Regular Jo
  • 5,190
  • 3
  • 25
  • 47
0

Found the answer : SELECT REGEXP_REPLACE('/foo1/foo2/foo3-bar1/p-115390-20.html','(p+\-+[0-9]+)\-+[0-9]+\.+html$','$1\.html')

Thanks again for your help