I have a problem with OpenRefine. I am adding a new column based on a url and from there calling an API for getting some terms from a controlled vocabulary (AAT). I parse the results and I obtain a multivalued cells such as:
http://vocab.getty.edu/aat/300041366||aquatints (prints)::http://vocab.getty.edu/aat/300053242||aquatint (printing process)::http://vocab.getty.edu/aat/300191265||dust bags::http://vocab.getty.edu/aat/300191278||dust boxes::http://vocab.getty.edu/aat/300191278||dust boxes::http://vocab.getty.edu/aat/300191278||dust boxes::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300053242||aquatint (printing process)::http://vocab.getty.edu/aat/300041366||aquatints (prints)::http://vocab.getty.edu/aat/300041368||sandpaper aquatints::http://vocab.getty.edu/aat/300041368||sandpaper aquatints
Where I have the current structure:
URI||Corresponding_TERM::URI||Corresponding_TERM
etc.
I now need to choose one of those records. My solution is to use
value.split("::")[0]
in order to choose which element I want.
Unfortunately this solutions has very evident drawbacks, because the order of the elements in the array is not constant, so if the first element [0] would be the right one for one record, it would probably not be for the next one.
For explain myself better, I now have this kind of structure
-----------------------------------------------------------
|ID | Classification | Term_From_Thesaurus |
| 1 | Aquatints | uri||term1::uri||term2:: |
| 1 | Aquatints | uri||term1::uri||term2:: |
| 2 | Drypoints | uri||term3::uri||term4:: |
| 3 | Woodcut | uri||term5::uri||term6::uri||term7 |
-----------------------------------------------------------
And I need to associate term1 with Aquatints, term 4 with Drypoints and term 7 with Woodcut.
How can I do that? A solution could be using facet and a lot of manual work, but maybe there is a better one? What about going to each record and if ID = 1 they should use term1, if ID=2 should use term 4 etc. Would it be possible? I sincerely do not know how to use the value of another column as variable to determine the result of an operation. cell.cross would help, but in case I need to split the data into two files, and doesn’t seems to me a proper solution..