2

I am trying to use the REGEXP_EXTRACT(string, pattern) function in Tableau 10.0. I have worked with regex before in C, R and python. Tableau just doesn't seem to be cooperating for even the simplest regex, all I can get is "Null". My data source is an Excel spreadsheet (.xlsx). I also tried using a CSV file with the same data to no avail.

Examples: Players[1] = "Ezekiel Elliott, RB" (r.e the Players dimension contains values of the format [A-z]{1,40},\s[A-Z]{2}). I am trying to capture the last two letters, which contain the player's football position (RB, QB, etc.) in a calculated field. All following simply return null:

REGEXP_EXTRACT([Players], '(?<=,\s)[A-Z]{2}')

REGEXP_EXTRACT([Players], '(?<=,\s)[A-Z]{2}')

REGEXP_EXTRACT[[Players], "[e,E]")

REGEXP_EXTRACT("Ezekiel Elliott, RB", "[e,E]")

The last two are especially troubling, as I can't even match a letter.

Any solutions, explanations of what I'm doing wrong, or even an arrow pointing in an illuminative direction would be greatly appreciated. I've consulted the reference on ICU regular expressions, http://userguide.icu-project.org/strings/regexp, but with no luck.

Shaun
  • 23
  • 1
  • 6

1 Answers1

3

You need to capture what you need to extract:

REGEXP_EXTRACT[[Players], "([A-Z]{2})$")

The ([A-Z]{2})$ will match the following:

  • ([A-Z]{2}) - matches and captures into Group 1 (what you will get in the end) exactly two ASCII uppercase letters
  • $ - asserts the end of string position.

If you say the entries can be matched with ^[A-Za-z]{1,40},\s([A-Z]{2})$, you may also use this pattern (here, ^ matches the start of string, [A-Za-z]{1,40} will match 1 to 40 ASCII letters, , will match a comma, and the rest of the pattern is the same as above). Note that [A-z] matches more than just letters.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563