Questions tagged [google-refine]

OpenRefine (formerly Google Refine) is a free, open source, data cleaning tool.

[Google Refine] is a free, open source, data cleaning tool. It was originally called Freebase Gridworks and was developed by Metaweb before Metaweb's acquisition by Google. In 2012 support from Google have been removed and code moved to GitHub 1.

44 questions
7
votes
1 answer

Parse JSON in Google Refine

I'm trying to pull out specific elements from results from the Data Science Toolkit coordinates2politics API, using Google Refine. Here is sample cell #1: [{"politics":[ {"type":"admin2","friendly_type":"country","code":"usa","name":"United…
kateyg
  • 147
  • 1
  • 7
4
votes
1 answer

Progressive number in Openrefine column

Is it possible to generate a "counter", a progressive number in a column using GREL? For example, I would like to add value to that number to generate an identifier for each record.
Aubrey
  • 507
  • 4
  • 20
3
votes
1 answer

Script-driven automation of Google refine with ruby python perl java or otherwise

BACKGROUND: Co-worker Adam has been using Google refine to process database downloads with much success over the last year or so, but Adam got a new job offer and consequently all of his work and expertise he has done in Google refine is going…
dreftymac
  • 31,404
  • 26
  • 119
  • 182
3
votes
3 answers

How can you parse xml in Google Refine using jython/python ElementTree

I trying to parse some xml in Google Refine using Jython and ElementTree but I'm struggling to find any documentation to help me getting this working (probably not helped by not being a python coder) Here's an extract of the XML I'm trying to parse.…
mhawksey
  • 2,013
  • 5
  • 23
  • 61
3
votes
2 answers

Regex for value.contains() in Google Refine

I have a column of strings, and I want to use a regex to find commas or pipes in every cell, and then make an action. I tried this, but it doesn't work (no syntax error, just doesn't match neither commas nor pipes). if(value.contains(/(,|\|)/),…
Aubrey
  • 507
  • 4
  • 20
3
votes
2 answers

Google Refine recipe for reconciling messy entities in two databases

I have two databases of messy names such as these: Jindal, Bobby Fla. Gov. Bobby Jindal Bobby Jindal 3M Corp. 3M Menomonie I need to find the matches. Can anyone point me to or suggest a good recipe for how to do this in Google Refine? This…
kateyg
  • 147
  • 1
  • 7
3
votes
1 answer

Cell.cross() returns error in Google Refine projects

I'm trying to create a new column based on my main project's Date column that pulls timeline events from another Google Refine project: cell.cross("Clean5 Timeline", "TimelineDate").cells["TimelineEvent"].value[0] The dates are in the same format in…
kateyg
  • 147
  • 1
  • 7
2
votes
1 answer

Google Refine and fetching data from freebase for a large data set to create a column from URL not working

I have a google refine project with 36k rows of data. I would like to add another column with fetching json data from freebase url. I was able to get it working on a small dataset but when i ran it on this project it took few hours to process and…
Yan
  • 3,533
  • 4
  • 24
  • 45
2
votes
2 answers

Can I call external *python* functions from google refine?

I'm investigating Google refine to speed up some of my data work -- never used it before this week, but I like a lot of what I see. My biggest question so far is whether it's possible to call external python functions from Refine. I know you can…
Abe
  • 22,738
  • 26
  • 82
  • 111
2
votes
1 answer

OpenRefine: 'Fill Up' replacing values along one column

I have following table ╔════════╦════════╦════════╦════════╗ ║ record ║ Brand ║ Model ║ Spec ║ ╠════════╬════════╬════════╬════════╣ ║ 1 ║ X ║ null ║ 1 ║ ║ ║ X ║ DF ║ 3 ║ ║ ║ X ║ null ║ 5 …
til
  • 832
  • 11
  • 27
2
votes
2 answers

Extracting email addresses from messy text in OpenRefine

I am trying to extract just the emails from text column in openrefine. some cells have just the email, but others have the name and email in john doe format. I have been using the following GREL/regex but it does not return the entire…
Abi Hassen
  • 23
  • 2
2
votes
3 answers

Parsing out part of string in Google Refine - error message

I am cleaning a dataset using Google Refine. I have one column with dates in the mm/dd/yyyy format. I want to create a new column in which mm/dd/yyyy is replaced by yyyy only. I have tried value.replace(/.+(\d\d\d\d)\*/, /$1/) and what showed up…
AnnaGo
  • 23
  • 2
2
votes
1 answer

How to use Google Refine to replace string value with Fingerprint?

I have a column with 100,000+ strings in it. I wish to have Google Refine replace these strings with their Fingerprint. I selected the column in Google Refine, and created a Text Facet. From that Text Facet I can select "Cluster". This will show…
Brian Feeny
  • 441
  • 4
  • 14
2
votes
2 answers

How do i integrate an If function within the forEach function with GREL?

i am working with Google Refine right now. My goal is to split a single, existing column into two parts. I am using the built in "add column based on...“-function. The column contains a street name and the coressponding house number. For example,…
user1260086
  • 31
  • 1
  • 9
2
votes
1 answer

Is there a way to subfacet a table which has been already “faceted”?

I have a table on which I'm applying a customized facet in order to find duplicates (on a column). Now I'd like to apply a new facet (on another column) on the table with the facet. Is that possible? It seems that it can be used only one facet per…
mellin
  • 307
  • 5
  • 13
1
2 3