7

I am trying to extract a sequence of numbers from a column in Google Refine. Here is my code for doing it:

value.match(/[\d]+/)[0]

The data in my column is in the format of

abcababcabc 1234566 abcabcbacdf

The results is "null". I have no idea why!! It is also null if instead of \d I try \w.

pnuts
  • 58,317
  • 11
  • 87
  • 139
mchangun
  • 9,814
  • 18
  • 71
  • 101

1 Answers1

8

OpenRefine doesn't add implicit wildcards to the end of the pattern as some systems do (and as one might expect). Try this pattern instead:

value.match(/.*?(\d+).*?/)[0]

You need the lazy/non-greedy qualifier (ie question mark) on the wildcards so that they don't gobble up some of your digits too. If you just use /.*(\d+).*/ you'll only match a single digit because the rest of them will be taken by the .* pattern.

Full documentation for the implementation can be seen in Java's Pattern class docs.

Tom Morris
  • 10,490
  • 32
  • 53
  • Hi Tom - Thanks for your answer. I tried your suggestion but I'm still getting a NULL – mchangun Jul 27 '13 at 15:24
  • I thought perhaps it was because I was using the development version, but I just went back and cut-and-paste the exact data and regex from this page into Refine 2.5 and got 1234566 so I'm not really sure what to suggest. – Tom Morris Jul 27 '13 at 20:18
  • It actually works now - I was trying it on an input which is slightly different. Can you explain / parse what your regex means? What is the lazy/non-greedy qualifier? Also the documentation I am reading doesn't seem to help (https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Regular-Expressions). Is there another reference that explains regex in Open Refine? Thank you! – mchangun Jul 28 '13 at 10:44
  • It uses Java's regex implementation, so the Pattern class contains the documentation http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I've updated my answer with more explanation. – Tom Morris Jul 28 '13 at 15:06