2

I have my own Python library that I would like to use in OpenRefine as described here

However, it seems that all the Python code in OpenRefine goes through Jython which supports only Python 2

Is there a way to run Python3 code in OpenRefine?

cheers

Jan
  • 7,444
  • 9
  • 50
  • 74

2 Answers2

3

Short answer: NO. Openrefine uses Jython, which is currently based on python 2.7 and there is no immediate or short term plans to move to 3.X versions.

BUT.

There is a trick to do this, as soon as you have python3 installed on your machine. Python2 allows the execution of a command-line script/tool, and collecting the result.

This simple python2 script will do that :

# This jython2.7 script has to be executed as jython, not GREL
# It allows you to execute a command (CLI) in the terminal and retrieve the result.

# import basic librairies
import time
import commands
import random
# get status and output of the command
status, output = commands.getstatusoutput(value)
# add a random between 2 and 5s pause to avoid ddos on servers... Be kind to APIs!
time.sleep(random.randint(2, 5))
# returns the result of the command
return output.decode("utf-8")

I use it to execute local python3 scripts,but also dig, curls, etc...

Use case : Suppose I have a bunch of internet domains in column A. I wan to perform a dig SOA command on these domains.

  • I create a column B, based on A: "dig SOA "+value, which will provide the exact command I want to execute.
  • I create a column C, based on B, with the above jython script.
  • I then parse the result.

This script is pure python2, doesn't rely on extra libs and should be working forever.

Disclaimer: execution of local code by a third-party app should be done cautiously.

hpiedcoq
  • 143
  • 5
  • I see. I was also thinking about trying something like this. I guess the performances might be lower as each time we need to spawn a new process. In my case I process 100K+ entries. Thanks for the tip. I believe this answer my original question – Jan Dec 29 '21 at 09:38
  • Support for Python 3 is not completely unimaginable, I hope we can get there in a not too distant future… https://github.com/OpenRefine/OpenRefine/issues/2249 – pintoch Dec 29 '21 at 20:22
2

I needed something like that (had to "guess" the language the text of one column was written), and, what I found to be a nice solution, and worked quite fast (with some "extra features" easily added) was to wrap my python3 program as a flask web API (took, literally 10 minutes), and use it from OpenRefine with "Add column by fetching URL".

The added bonus is that it was rather easy to run it in the fastest machine we had on site, adding cache, etc.

The only thing that I would like to have seen improved (on OpenRefine's side) is the ability to, optionally, fetch several URLs in parallel, then you could run several flask instances on several machines, and speed it up a little.

carlesm
  • 21
  • 5
  • yes also a good advice. I've tried that as well and it works. However it is much slower than running the code locally. And as I'm processing a lot of data, speed is important – Jan Dec 30 '21 at 09:18