There are several solutions to your problem. I would choose solution 2 or 3 depending on your Python skills.
Solution 1: Python/Jython
If you are familiar with Python you could change the expression language to Python/Jython and then just write Python 2.7 code.
Following your example the code for the transformation dialog would look like:
list = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']
for i in list:
if value == i:
return "United States"
return value
Solution 2: Python/Jython using set
An improved version would be to use a set instead of a list and skip the for loop:
states = set(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'])
return "United States" if value in states else value
Solution 3: GREL
If you want to use GREL instead of Python/Jython the solution would look like the following code snippet using an if-control and array functionality from GREL.
if(
['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'].inArray(value),
"United States",
value
)
Solution 4: Text Facet
In cases where you do not have list of the values to extract available you could use the text facet functionality from OpenRefine to filter the rows you want and then use the text transformation dialog with a fixed value like "United States"
. Note the quotes that tells the GREL interpreter to use the string "United States" instead of trying to interpret United States
as GREL function.
Solution 5: Reconciliation
Sometimes you have quite mixed data but with a normalized vocabulary in a column. One trick so separate such a column is to use reconciliation against a data source like Wikidata that already has the data sorted into categories/types/.... After the reconciliation process you can then load the category/type/... as additinonal column that can be used to filter the data.