2

I want to use the regex_replace or regexp_extract in hive where i can get only the first n occurrences of special character "^" except the last occurrence and text after last occurrence. f For example:

  • 101^11 should extract 101
  • 101^1111^232 should extract 101^1111
  • 144^899^890^1 should extract 1144^899^890

I have to do with only regex and not with string functions or udf.

James Z
  • 12,209
  • 10
  • 24
  • 44
ae8
  • 53
  • 1
  • 6

1 Answers1

2

Try with regexp_extract function

In hive:

hive> select regexp_extract('101^11',"(.*)\\^",1)
+------+--+
| _c0  |
+------+--+
| 101  |
+------+--+

In Scala:

scala> val hiveContext = new HiveContext(sc)
scala> val df= hiveContext.sql("""select regexp_extract('101^11','(.*)\\\^',1)""");
scala> df.show()
+---+
|_c0|
+---+
|101|
+---+

In Pyspark:

>>> hiveContext=HiveContext(sc)
>>> df=hiveContext.sql("select regexp_extract('101^11','(.*)\\\^',1)")
>>> df.show()
+---+
|_c0|
+---+
|101|
+---+
notNull
  • 30,258
  • 4
  • 35
  • 50
  • 1
    I have tried this example and it was success.Thanks for the quick help. – ae8 Jun 10 '18 at 12:24
  • 1
    If i try the same query with spark sql the data is not being returned for extract column. DataFrame df= hiveContext.sql("select regexp_extract(eid,'(.*)\\^',1) as eid1,name from employee"); System.out.println("schema "+df.schema()); System.out.println("first record"+df.first()); first record[,rakes] – ae8 Jun 10 '18 at 14:15
  • Please enclose with triple quotes ***val df= hiveContext.sql("""select regexp_extract('101^11','(.*)\\^',1)""");*** it should work..!! – notNull Jun 10 '18 at 16:27
  • I am trying in spark sql java, it is showing error with triple quotes in eclipse. – ae8 Jun 11 '18 at 02:33