2

i am working with Google Refine right now. My goal is to split a single, existing column into two parts. I am using the built in "add column based on...“-function. The column contains a street name and the coressponding house number. For example, I've got this:

Stackoverflow Street 89

Every row is structured in the same way, which means that a streetname can be long or short, can contain one or more words, but it always ends with a (house) number and its always separated by a blank space.

As I am working with Google Refine, i was about to do this by using GREL statements (Google Refine Expression Language). The idea is like that: Split every value divided by a blank space into pieces. Save each split value in a variable. Then check if the specific values contains a numeric value (which is the house number). If it is a numeric value, take that value and forward it to the output. I already got something like this.

forEach(value.split(" "), v, v).join(";")

Here is the question: How do i integrate an IF function within the forEach function? If i would know how to do it, i would probably be able to not only split the values, but also check if its numeric. To check if it is numeric, one can use isNumeric(value).

I appreciate every hint. Thanks, Jörg.

user1260086
  • 31
  • 1
  • 9

2 Answers2

3

If all the constraints that you describe hold true (it's always the last piece and always exists), I'd just use value.split('')[-1]

In answer to the direct question that you asked, in case someone else finds this in search, your expression so far is equivalent to:

value.split(" ").join(";")

ie the forEach isn't doing anything. To integrate the if(), use it as part of the expression which forms the 3rd parameter to forEach

forEach(value.split(" "), v, if(isNumeric(v),null,v)).join(";")

I'm not sure what output format you're trying to achieve, but this will give you a semi-colon separated list of all non-numeric words in your cell value.

Tom Morris
  • 10,490
  • 32
  • 53
1

I would use Edit column > Split into several columns... on your column and split it with the regex \s(?=\d+), which means that the split has to be where a space character is followed by more than one numbers, and the numbers are not part of the split.

halex
  • 16,253
  • 5
  • 58
  • 67