Value.match() Regex in Google Refine

Question

I am trying to extract a sequence of numbers from a column in Google Refine. Here is my code for doing it:

value.match(/[\d]+/)[0]

The data in my column is in the format of

abcababcabc 1234566 abcabcbacdf

The results is "null". I have no idea why!! It is also null if instead of \d I try \w.

Tom Morris · Accepted Answer · 2013-07-28T15:07:05.233

8

OpenRefine doesn't add implicit wildcards to the end of the pattern as some systems do (and as one might expect). Try this pattern instead:

value.match(/.*?(\d+).*?/)[0]

You need the lazy/non-greedy qualifier (ie question mark) on the wildcards so that they don't gobble up some of your digits too. If you just use /.*(\d+).*/ you'll only match a single digit because the rest of them will be taken by the .* pattern.

Full documentation for the implementation can be seen in Java's Pattern class docs.

edited Jul 28 '13 at 15:07

answered Jul 27 '13 at 13:39

Tom Morris

10,490
32
53

Hi Tom - Thanks for your answer. I tried your suggestion but I'm still getting a NULL – mchangun Jul 27 '13 at 15:24
I thought perhaps it was because I was using the development version, but I just went back and cut-and-paste the exact data and regex from this page into Refine 2.5 and got 1234566 so I'm not really sure what to suggest. – Tom Morris Jul 27 '13 at 20:18
It actually works now - I was trying it on an input which is slightly different. Can you explain / parse what your regex means? What is the lazy/non-greedy qualifier? Also the documentation I am reading doesn't seem to help (https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Regular-Expressions). Is there another reference that explains regex in Open Refine? Thank you! – mchangun Jul 28 '13 at 10:44
It uses Java's regex implementation, so the Pattern class contains the documentation http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I've updated my answer with more explanation. – Tom Morris Jul 28 '13 at 15:06

Value.match() Regex in Google Refine

1 Answers1

Linked