1

I read that OpenRefine Wikidata plugins always operates in row mode.

I am in a situation where I have data in records mode : The record is a serial/magazine, and the rows in this records are the various formats of the same serial/magazine (typically, paper and electronic version). Each row has a unique ISSN identifier.Wikidata considers there is only one item for the serial/magazine (my records), but no separate items for each of the formats (my rows).

When reconciling data to Wikidata, all rows of the same record will typically match the same wikidata item, or none of the rows will match, or sometines only one row of the record will match (e.g. if only one ISSN of the format - say paper format - is known in Wikidata, but not the others).

enter image description here

What I would like to do is create items in Wikidata for each records for which no reconciliation result was found (iow, for which no rows has matched), and not for each row. And, when creating this item, I would like to add the ISSNs of all the rows in this record.

I am wondering if it is possible to do that ? and how ?

Thanks

ThomasFrancart
  • 470
  • 3
  • 13

1 Answers1

0

Yes, it is possible. You need to perform the reconciliation operation on the first column instead.

  • As mentioned by the documentation, use the Fill down operation on the first column, which defines your records;
  • Reconcile the column to Wikidata;
  • Then, the Create one new item for similar cells action (in the Reconcile -> Actions menu)
  • Create a schema where the first column is used as subject id.

Assuming the values in your first column are initially distinct (which is the case in your example), this will create one item per record.

In your example, because your first column contains ISSNs and not titles, I would first create a root column with titles instead (before the process explained above). In rows mode, facet to keep the first row of each record by selecting non-blank values in the first column, and then copy your column with titles, and move this new column in first position. This should ensure that reconciliation picks up existing items. Note that if the same title is used by multiple journals this will create a single item for both of them, unless you add other properties in your reconciliation configuration (such as ISSN).

pintoch
  • 2,293
  • 1
  • 18
  • 26
  • Just to make sure I understand : If I moved the title in the first column, should I then 1/ Fill down this column 2/ reconcile based on this column + ISSN number (yes I need to do that) and 3/ Create one new item for similar cells ? Thanks – ThomasFrancart Oct 11 '19 at 10:25
  • Yes! You could also do the reconciliation before the fill down operation, if you want to use multi-valued properties in reconciliation. – pintoch Oct 11 '19 at 13:40
  • I trust this is the correct solution, however for other reasons we sent for a different process where we build a dataset with "1 potential wikidata item per line", with a fixed number fo columns to store the different ISSNs attached to the ISSN-L. This is much easier to deal with in OpenRefine, and allows to preprocess the data and apply all the required business rules before actually doing the wikidata reconcile + import process. – ThomasFrancart Nov 04 '19 at 09:35