5

How to use the opencorp API? For instance

According to the website:

The Open Refine Reconciliation API allows OpenRefine users to match company names to legal corporate entities. This is especially useful when you have an existing spreadsheet or dataset featuring lots of companies. Matching (or reconciling) to legal entities allows you to get more information about the companies (for example the registered address or statutory filings), and makes it easier to match with other datasets or exchange with other organisations.

Following the documentation : Documentation

I can run a GET query in postman for something like this:

https://opencorporates.com/reconcile/suggest?prefix=AMAZON

or even search for companies within specific regions.

This is quite good and useful in individual cases but I have 2 questions.

1) How can I generalize this to larger sets of data.

2) Accoring to the website it also says:

Matching (or reconciling) to legal entities allows you to get more information about the companies (for example the registered address or statutory filings).

How do I access this information?

The responses from the GET featured in the documentation don't show this information.

user9940344
  • 574
  • 1
  • 8
  • 26

2 Answers2

2

The reconciliation API implemented by OpenCorporates is specified by OpenRefine on its wiki.

To reconcile larger datasets you should use the multiple queries mode, as follows:

https://opencorporates.com/reconcile?queries={%22q0%22%3A{%22query%22%3A%22cambridge%20analytica%22},%22q1%22:{%22query%22:%22mossack%20fonseca%22},%22q2%22:{%22query%22:%22danske%20bank%22}}

Here is a readable version of the queries parameter in the request above:

{
  "q0": {
    "query": "cambridge analytica"
  },
  "q1": {
    "query": "mossack fonseca"
  },
  "q2": {
    "query": "danske bank"
  }
}

To retrieve more information from the records returned by the reconciliation API you will need to use their REST API, as their reconciliation endpoint does not support the Data Extension API specified by OpenRefine so far. You will need to get an API key for that, if you want to use it on more than a few records.

pintoch
  • 2,293
  • 1
  • 18
  • 26
  • Would this method be effective for say, 1000 rows? Also do you know how to bring back the extra information like registered address etc? Could this be implemented in python if I cannot install OpenRefine on my system due to permission issues? – user9940344 Aug 01 '19 at 15:00
  • See my edited answer above about fetching extra data. You do not need to install OpenRefine at all to use this API, you can totally implement that in Python directly. – pintoch Aug 01 '19 at 15:01
  • Okay will take a look at that. Does the REST API give me the ability to rank the companies based on similarity of name? So could I say take a list of companies and for each company on that list return the top 3 closest matches? Is that something that would be easy enough to implement? – user9940344 Aug 01 '19 at 15:03
  • I would use the reconciliation API to retrieve the candidates first, and then use the REST API to retrieve information about these candidates. – pintoch Aug 01 '19 at 15:09
  • That makes sense. Could you give me a specific example of how to GET this information for just one company? So one query for the reconciliation API to return the score and then one for the REST API to return some other info (maybe address for an example)? – user9940344 Aug 01 '19 at 15:25
  • Here is an example for a single company, from the docs of the REST API: https://api.opencorporates.com/companies/nl/17087985 – pintoch Aug 01 '19 at 15:37
  • Okay I think all the information is there I just need to spend some time to understand as this is my first time using API's. Thank you for your help. One final question if you do not mind: How would I go about doing this in Python? Would I have to get the Python script from postman and then loop over all the companies I am searching for? Then for the highest scored matched companies loop over them for the REST API to pull in this information? Does that make sense as a strategy? – user9940344 Aug 01 '19 at 15:41
  • Absolutely! I am not familiar with postman but in Python I would do this using the `requests` library. – pintoch Aug 01 '19 at 15:48
  • Do you know why this get https://opencorporates.com/reconcile?query=amazon-uk limited would only return a score of 51 and a false match even though the top record returned is a company by the exact same name???? See output here: https://pastebin.com/hViUiBGH Shouldn't the score be higher and a match found? – user9940344 Aug 01 '19 at 15:59
1

Use OpenRefine: it has all that you asked for and a lot more, and rewriting it would not be effort well spent.
Fix your permission problems.

Vladimir Alexiev
  • 2,477
  • 1
  • 20
  • 31