I have a column with 100,000+ strings in it. I wish to have Google Refine replace these strings with their Fingerprint.
I selected the column in Google Refine, and created a Text Facet. From that Text Facet I can select "Cluster". This will show me the clusters, which I assume to mean string values that have the same fingerprint, and allow me to select a New Cell Value, which defaults to the name of the first member of the cluster.
I wish for this name to just be the fingerprint. The reason is, I need to do this operation to multiple files and I need them to be the same value if they are indeed part of the same cluster. I cannot concatenate the files, as this results in too much data for Refine to handle, despite optimizing the memory parameters as per the Refine FAQ.
So I am simply looking for an operation that takes each cell in a column, calculates its Fingerprint, and replaces the value in the column with its Fingerprint.
I am using Google Refine 2.5 on OSX 10.7