Questions tagged [openrefine]

OpenRefine is the new name for the data cleaning tool which used to be called Google Refine (and was born as Freebase Gridworks)

Resources

400 questions
13
votes
2 answers

How to perform approximate (fuzzy) name matching in R

I have a large data set, dedicated to biological journals, which was being composed for a long time by different people. So, the data are not in a single format. For example, in the column "AUTHOR" I can find John Smith, Smith John, Smith J and so…
group413
  • 159
  • 1
  • 1
  • 5
9
votes
2 answers

Grel to apply to ALL columns or current column

I have a transposition that I'd like to apply to multiple columns. The Grel generated shows the columnName or Base name, but that means I have to edit the code for each column. Thought there was a way to find the column index and have code that…
Sonicthoughts
  • 548
  • 1
  • 4
  • 16
8
votes
2 answers

Use POST method with URL and Google Refine/ OpenRefine

OpenRefine http://openrefine.org/ allows URL generation using GREL as tokens. I want to connect to an API which only supports a POST method . Can I format the URL so it calls the REST API using POST? Ref:…
Sonicthoughts
  • 548
  • 1
  • 4
  • 16
7
votes
1 answer

Replace null values in cell

I am unable to replace null values in cells. I have created a facet to only display cells that have null values. I then went to edit cells > Transform function and tried to use the replace function but it does not seem to be working. Different…
Chris Smith
  • 399
  • 3
  • 16
7
votes
1 answer

Value.match() Regex in Google Refine

I am trying to extract a sequence of numbers from a column in Google Refine. Here is my code for doing it: value.match(/[\d]+/)[0] The data in my column is in the format of abcababcabc 1234566 abcabcbacdf The results is "null". I have no idea…
mchangun
  • 9,814
  • 18
  • 71
  • 101
6
votes
1 answer

Simple OpenRefine IF to create a new column

Im trying to create a new column which contains true or false. Basically column A has a number in it, between 1 and 6, if its higher than 3 I want the new column 'match' to contain true, otherwise it contains false. Using the add column based on…
Paul M
  • 3,937
  • 9
  • 45
  • 53
6
votes
2 answers

Split multi valued cells in more than one column into rows (Open Refine)

I have been cleaning a table on Open Refine. I now have it like this: REF Handle Size Price 2002, 2003 t-shirt1 M, L 23 3001, 3002, 3003 t-shirt2 S, M, L 24 I need to split those multivalued…
AnaRita
  • 127
  • 1
  • 11
6
votes
1 answer

How to resolve IncompatibleClassChangeError interface not implemented

I know the question is asked already but somehow I can't find any convincing solution after googling for about an hour. I am using apache-jena to load RDF model from a url. And I am getting IncompatibleClassChangeError with following message Class…
Ahsan Iqbal
  • 1,422
  • 5
  • 20
  • 39
6
votes
1 answer

openrefine flag changed rows

I'm using openrefine to cleanup an excel data set. I have about 70 operations and I've been cutting and pasting on different data sets. I maintain a record id and export to a new excel sheet. Then I reload the sheet using the record id. It works…
Sonicthoughts
  • 548
  • 1
  • 4
  • 16
5
votes
2 answers

How to effectively use the OpenCorporates Reconciliation API?

How to use the opencorp API? For instance According to the website: The Open Refine Reconciliation API allows OpenRefine users to match company names to legal corporate entities. This is especially useful when you have an existing spreadsheet or…
user9940344
  • 574
  • 1
  • 8
  • 26
5
votes
2 answers

Extract a html tag that contains a string in openrefine?

There is not much to add to the title. It's what i'm trying to do. Any suggestions? I reviewed the docs at github and googled extensively. The best i got is: value.parseHtml().select('p[contains('xyz')]') It results in a syntax error.
treakec
  • 139
  • 1
  • 9
5
votes
2 answers

Searching and replacing multiple values in Google Refine

I'd like to search and replace multiple values in a column with a single function with GREL (or anything other) in Google Refine. For example: 1. replace(value, "Buch", "bibo:Book") 2. replace(value, "Zeitschrift", "bibo:Journal") 3. replace(value,…
CH_
  • 685
  • 1
  • 7
  • 18
5
votes
1 answer

Google Refine: iterate over a JSON dictionary

I've got some JSON within Google Refine - http://mapit.mysociety.org/point/4326/0.1293497,51.5464828 for the full version, but abbreviated it's like this: {1234: {'name': 'Barking', 'type': 'WMC'}, 5678: {'name': 'England', 'type': 'EUR'} } I only…
Dragon
  • 2,017
  • 1
  • 19
  • 35
4
votes
2 answers

How to facet multiple columns in Google Refine

I have a data set with 30 columns and multiple rows (some cells have no data). I would like to be able to facet the columns in groups. 1 2 3 4... Row1 A B C D Row2 E A D F Row3 Q A B H Given the above data I would like the facet to retun…
banjanxed
  • 73
  • 2
  • 6
4
votes
2 answers

Openrefine: Split multi-valued cells by token/word count?

I have a large corpus of text data that I'm pre-processing for document classification with MALLET using openrefine. Some of the cells are long (>150,000 characters) and I'm trying to split them into <1,000 word/token segments. I'm able to split…
DFM
  • 43
  • 4
1
2 3
26 27