0

I have a string field error_cd with the value "cntrlb cntrlb asdv cntrlb asvd cntrla cntrlb cntrlb"

Within PIG, I'm trying to use REGEX_EXTRACT_ALL(error_cd, '.*(cntrl[a-b]).*') to get back a tuple of (cntrlb,cntrlb,cntrlb,cntrla,cntrlb) or just (cntrl,cntrl,...,cntrl). Instead, I'm getting back just one match (cntrl).

Anybody know how to return all of the matches, as the function name implies?

user7337271
  • 1,662
  • 1
  • 14
  • 23
user1152532
  • 697
  • 3
  • 7
  • 15

1 Answers1

1

REGEX_EXTRACT_ALL is for extracting all of the capturing groups in a regular expression. It does not apply a single regular expression multiple times. This document is somewhat out of date, but it still is accurate for REGEX_EXTRACT_ALL.

There is no regular expression that can capture an arbitrary number of groups. (See this question.) If you had a known limit of times your cntrl string could occur, you could design a big ugly regex to capture them all, but it sounds like you'd be better off using TOKENIZE and then treating each word in your string individually.

Community
  • 1
  • 1
reo katoa
  • 5,751
  • 1
  • 18
  • 30