Wikidata import into virtuoso

Question

I would like to load wikidata into virtuoso. After days of searching I was unable to find any tutorial or at least somebody that uses virtuoso to sparql query wikidata. I would not like to spend money on a server to load 50G+ data for nothing. Why virtuoso and not blazegraph for instance? Because I'm used to use virtuoso for DBpedia.

You load Wikidata as any other RDF dataset into Virtuoso. Or what exactly is the question here? I mean, how to load data is clearly in the Virtuoso docs: http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader - or do you have any other question? — UninformedUser, Jun 26 '19 at 08:37
Actually, I am suffering with this now, since I am trying to load `latest-truthy.nt.bz2` and I am getting this error: `File latest-truthy.nt error 42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content` I think I will create a separate question for it — Mohamed Gad-Elrab, Jun 26 '19 at 08:48
This repo may be a good start: https://github.com/patrickhoefler/wikidata-virtuoso — Mohamed Gad-Elrab, Jun 26 '19 at 08:52
The repo noted above by @MohamedGad-Elrab is LONG outdated. I suggest you follow up to [this related issue on the Virtuoso project](https://github.com/openlink/virtuoso-opensource/issues/295), or [create your own new issue there](https://github.com/openlink/virtuoso-opensource/issues/). You could also post to the [OpenLink Community Forum](http://community.openlinksw.com/). — TallTed, Jun 26 '19 at 20:37

score 1 · Answer 1 · answered Mar 15 '22 at 13:08

1

Better late than never, here's a guide that covers the creation and deployment of a Wikidata instance using Virtuoso.

answered Mar 15 '22 at 13:08

Kingsley Uyi Idehen

895
7
7

What would be the minimum memory footprint for this? The example instructions have 386GB which is a bit stiff. – Wolfgang Fahl Mar 27 '22 at 06:31
And what are the details of the procedure? – Wolfgang Fahl Jul 20 '22 at 15:37

Peter F. Patel-Schneider · Answer 2 · 2020-05-10T16:36:05.530

As indicated here and elsewhere, loading Wikidata into Virtuoso should simply be a matter of creating the turtle file (better is to have multiple turtle files) from the download and bulk loading it in. To get decent performance, a number of parameters have to be changed in virtuoso.ini.

There is a problem loading Wikidata into Virtuoso, however, due to a long-standing bug in Virtuoso having to do with its implementation of geo-coordinates. To get around this requires patching Virtuoso, and is not for the faint of heart.

Here are the instructions on what do to get the open-source version of Virtuoso to load Wikidata. Note that the patching of the geo-coordinate code might cause problems using the resultant KB.

1/ Patch the geo-coordinate literal code, editing virtuoso-opensource/libsrc/Wi/rdfbox.c to comment out two pieces of code that check for non-terrestrial coordinates. Note that this is a bug in Virtuoso and that Wikidata conforms to the specification of this datatype.

/*non-terrestrial cooordinates if (RDF_BOX_GEO_TYPE == type && DV_GEO != box_dtp && DV_LONG_INT != box_dtp) sqlr_new_error ("42000", "RDFGE", "RDF box with a geometry RDF type and a non-geometry content"); */

/*non-terrestrial cooordinates if (type == RDF_BOX_GEO && box_dtp != DV_GEO) sqlr_new_error ("22023", "SR559", "The RDF box of type geometry needs a spatial object as a value, not a value of type %s (%d)", dv_type_title (box_dtp), box_dtp); */

2/ Patch the Turtle loader, editing the end of rdf_rl_lang_id in virtuoso-opensource/libsrc/Wi/ttlpv.sql to look as follows. Note that this is another bug in Virtuoso that is triggered by parallel loading of langstrings with different languages.

id:= sequence_next ('RDF_LANGUAGE_TWOBYTE', 1, 1); --pfps insert into rdf_language (rl_twobyte, rl_id) values (id, ln); insert soft rdf_language (rl_twobyte, rl_id) values (id, ln); commit work; -- if load non transactional, this is still a sharpp transaction boundary. log_enable (old_mode, 1); --pfps get the actual id, as it may be different id := (select RL_TWOBYTE from DB.DBA.RDF_LANGUAGE where RL_ID = ln); rdf_cache_id ('l', ln, id); return id;

Could you please be more specific in how it can be made working? — Wolfgang Fahl, May 09 '20 at 15:26
Note that the speed of loading Wikidata into Virtuoso is much improved if you split the Wikidata dump into multiple (ideally hundreds) of pieces. This also is not trivial. — Peter F. Patel-Schneider, May 10 '20 at 16:38
Also, if you are loading the complete dump (with over 10 billion triples) you need a machine with at least 256GB of main memory (maybe even at least 512GB) and an SSD to store the DB. A machine with less memory will start thrashing part way through the load and progress will be *very* slow thereafter. — Peter F. Patel-Schneider, May 10 '20 at 16:41
http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData#Test_with_Apache_Jena shows the current try with a different SPARQL store. I'll test blazegraph and virtuoso after that again. The official blazegraph import also splits the triples. Still I'd appreciate to get a detailed documentation of the process to be included in the wiki mentioned above. — Wolfgang Fahl, May 10 '20 at 17:34

Wikidata import into virtuoso

2 Answers2

Linked