Best practices for data management in Prolog

Question

I'm just getting involved in using Prolog to handle more than just the simplest forms of data (facts) and am looking for some guidance from the seasoned Prologers...

If I want to dynamically manage data or facts, I have a couple of major choices, such as do I:

Manage the data as assertions in Prolog, OR
Interface to a database from Prolog, OR
Possibly a combination of both

If I manage facts as assertions in Prolog, I also have the question of the best way to represent those facts. Let's suppose I have a person who has a first name, last name, and an age. I can assert it as:

person(first_name(_), last_name(_), age(_)).

Or have an implicit assumption of what the attributes of person are:

person(_, _, _).  % first name, last name, age

If I want to associate a person with something else, I really need a key for a person. So I might be inclined to assert a person as:

person(id(_), ...).  % Maintain id as a uniq person key; or done implicitly as above

Of course, now I'm making my Prolog assertions look like relational database table entries. And it makes me wonder if I'm taking the wrong approach and overly complicating the representation of facts.

So really, my question is: Are there some best practices to consider when managing medium to complex data in Prolog? The naming convention is the small side of it. I've read bits like the assert/retract in Prolog are inefficient. So I'm also wondering about how to approach the data organization itself, like when to resort to an external SQL database versus Prolog-only representation.

ADDENDUM

I would assume that use of a key for records, as is done in a relational database, would be desirable for the very reasons a relational database uses them. Which means that the key must be maintained. It seems cumbersome to do this manually (explicitly) in Prolog for each case, so how is this done generally? Or is my assumption in correct?

mat · Accepted Answer · 2013-06-10T20:10:06.767

12

Consider using a more descriptive name for your predicate, for example:

id_fname_lname_age(_, _, _, _).

This explicitly denotes what the arguments are without needing any additional structures.

In my opinion, a good rule of thumb for naming predicates is to describe the arguments in the order they appear in, using declarative names, separated by underscores.

EDIT: As to your additional questions: assertz/1 is slow (and has many other disadvantages) in comparison to a nicely declarative programming style that simply passes arguments between predicates that do not intrinsically require any modifications of the clause database. When you really need to assert additional facts because you are using Prolog like a relational database system, then assertz/1 is one way to do it (other options are mentioned in other answers here), and will likely be comparable in efficiency to any other relational database system for many usage scenarios. As already mentioned, several modern Prolog systems perform just-in-time indexing on all arguments, and you therefore need not explicitly declare any "keys".

edited Jun 10 '13 at 20:10

answered Jun 08 '13 at 14:07

mat

40,498
3
51
78

1

thanks. so that's a good approach to naming at the lowest level. What about the bigger picture? How 'crazy' does one get with asserting lots of interrelated data as Prolog facts, using that naming convention as an example, before one decides it's more appropriate to hook Prolog into an SQL database? – lurker Jun 08 '13 at 14:13
2

There are many projects using Prolog for large databases with lots of interrelated facts (see for example semantic web use cases for SWI-Prolog etc.) without needing to resort to SQL, so you can probably go quite far with plain Prolog, especially since modern systems also perform just-in-time indexing and various other techniques that yield good performance in many use cases. I think a good, descriptive, naming convention for Prolog facts makes it even easier to formulate complex queries in Prolog than in SQL. – mat Jun 08 '13 at 14:22
1

Indeed, I've read with great interest some posts here on stackoverflow.com about such examples, which is what triggered my question. If I start asserting facts for interrelated data, it gets hairy pretty quickly, and I wondered about how that's managed in general in Prolog in terms of best practices for naming, interlinking, and defining basic predicates for data management. – lurker Jun 08 '13 at 14:25
1

Great suggestion. Would be nice to see this more supported by development tools. – CapelliC Jun 09 '13 at 07:56

score 6 · Answer 2 · 2013-06-10T09:52:55.033

6

No one has yet addressed your question regarding efficiency when using assert/retract.

For SWI-Prolog, in a nutshell, facts are indexed (just-in-time means when they are first queried), and lookup is very efficient (based on hash-tables). By default indexing is only on the first argument, but there are built-ins for working around this (I guess it would be a pain to keep everything in a normalized form).

The rule-of-thumb seems to be, as long as all your data fits in memory, and you don't assert/retract too often, it is the best choice. You can use library(persistency) to make a predicate persistent.

As for things like constrains and triggers etc, I guess you would have to write your own predicates, but with Prolog's syntax this should not be more verbose than defining these in SQL (my experience in relational databases is quite limited though so I might be talking out of my ass).

edited Jun 10 '13 at 09:52

answered Jun 09 '13 at 17:02

Perhaps the topic of another best-practice question, but I'm a user of `gprolog`, yet a lot of answers to Prolog questions are given in terms of SWI-Prolog built-in predicates which don't exist in `gprolog`. I have this feeling that perhaps I should migrate to SWI-Prolog... – lurker Jun 09 '13 at 21:19
I'm quite adept at SQL and I can affirm your remark. SQL is extremely verbose, and the more you do with it, the more verbose it gets, not less. – Daniel Lyons Jun 10 '13 at 06:15
@mbratch I use GNU Prolog from time to time (trying to make more of an effort lately as I try to get solid on ISO vs. SWI) but I almost always switch back in short order because it just feels so cramped compared to SWI. – Daniel Lyons Jun 10 '13 at 06:16
@mbratch I think this summarizes it very neatly: http://www.swi-prolog.org/pldoc/doc_for?object=section(2,'1.1',swi('/doc/Manual/swiprolog.html')) – Jun 10 '13 at 09:57
@Daniel, from what I've been reading on stackoverflow and elsewhere, it appears SWI is ISO compliant plus it has a variety of built-ins that GNU Prolog lacks that make life easier. I've also found that assert/retract performs much better on SWI, whereas general inference/traversal appears to be noticeably slower. This is based on only a very few sample cases. – lurker Jun 10 '13 at 11:21

score 5 · Answer 3 · edited Mar 22 '22 at 12:34

Prolog is based on a relational data model.

Then a relational data model is - banally - adequate to Prolog, albeit - personally - I miss the metadata facilities you get with SQL DML. Documentation - when available - can easily go out of sync, and it's a pain to handle relations with many columns, partly because Prolog is typeless, and partly because you cannot (easily) 'call by name' columns - Prolog misses the 'projection operator' available in relational algebra (and SQL, of course). SWI-Prolog has library(record) to overcome the problem, but I don't like it too much.

Generally, when it come to some 'real world' data modelling, like deeply nested (XML/HTML/SVG/whatever) representations, or dimensionally indexed entities, like spatial and geographical DBs, or large graphs, as those requested by today ontologies, relational only data modelling can be inadequate.

You must supply the missing details, and this technically can be very complex. If you need some indexing your Prolog engine doesn't provide, you will get buried in writing difficult interfaces in low level languages (usually C). Then why not to use some easier language, with ready to use (and debugged) libraries modeled on that complex data ? There are plenty of them.

As a consequence, SWI-Prolog, which development get driven by practicality, instead of abstract language (both natural and synthetic) research that was the initial focus of Prolog applications, has specialized interfaces - for instance - for the Web and for ontologies. See the packages page, most of them are well crafted interfaces to complex data.

From a SW engineering perspective, availability of such interfaces make a difference in language choice. Just to underline how high SWI-Prolog is going in reputation, it has been recently nominated (like Python) for Dutch ICT innovation award.

Ongoing development - like quasi quotation for embedding javascript in DCG based HTML generation - and great support from the SWI-Prolog mailing list are great value adder!

Personally, I'm dedicating my efforts to learn - by applying to practical tasks - RDF modeling.

score 3 · Answer 4 · edited Mar 21 '22 at 13:39

3

Boris - I made this assertion, or nearly, recently on the swipl list, "The best way to save it is to use qsave_program and not just a text file with all facts." and Jan made a convincing argument that using library(persistency) was a better option. I think the days of save_state as persistance mechanism are gone.

edited Mar 21 '22 at 13:39

Erik Kaplun

37,128
15
99
111

answered Jun 09 '13 at 19:07

Anniepoo

2,152
17
17

I cannot find it. Do you mean "persistency"? – Jun 09 '13 at 20:16
Link for the curious: http://www.swi-prolog.org/pldoc/doc/swi/library/persistency.pl – Daniel Lyons Jun 10 '13 at 06:17
I think that qcompile (or better, [load_files](http://www.swi-prolog.org/pldoc/man?predicate=load_files/2)) should be used for efficiency – CapelliC Jun 10 '13 at 08:20
Thank you for the information, I have not been on the list that long or I have missed this conversation. – Jun 10 '13 at 09:55

score 2 · Answer 5 · answered Jun 09 '13 at 02:34

2

If you're interested in using your first format, I'd highly recommend using a list inside the predicate, like so:

person([first_name(_), last_name(_), age(_)]).

This way you can add or remove things as you want. It also makes it easier to grab info out of a particular piece:

?- person(P), member(first_name(Name), P).
P = [first_name(dave), last_name(hardy), middle_name(robert), age(27)],
Name = dave .

This method also makes it really easy to maintain lists of the data, in case you don't want to have the data permanently asserted.

answered Jun 09 '13 at 02:34

Raceimaztion

9,494
4
26
41

1

This is the strangest advice I've seen given around here in a while. Doing a query against more than one structure is not hard, but using lists inside terms like this *will* defeat Prolog's indexing system and lead to a lot of unnecessary inefficiency. – Daniel Lyons Jun 09 '13 at 04:01
1

I agree with Daniel. You cannot model complex data that way in Prolog. I see lists as historically - and naively - providing the link between relational and functional programming. IMHO never merged well. – CapelliC Jun 09 '13 at 08:00

score 0 · Answer 6 · answered Jun 09 '13 at 04:16

0

Download the Prolog version of WordNet and take a look at what's going on in there:

What would be a relational database table is a separate file.
If you must, generate an integer ID and put it in the first position. WordNet chose only to give the word senses their own IDs.
Document what goes in each position in the documentation.

The other proposals here seem unnecessarily burdensome to me. If you are content with only Prolog accessing this data, then store it in Prolog's format and make life easy on you while you use Prolog. If Prolog is going to be just one of several languages accessing the data, stick it in a relational database. The burden of getting to it from Prolog will be offset by everything else being easier.

Migrations are not terribly hard to fake with Prolog. Take advantage of listing/1:

%   save_database(+functor, +filename)
%
% Records all the facts of Functor in Filename
save_database(Functor, Filename) :-
  telling(OldStream), tell(Filename),
  listing(Functor),
  told, tell(OldStream).

e.g., save_database(foo/1, 'foo.pl'). You can easily write data migrations on top of this. I really don't see a use case that justifies the greater complexities suggested in the other answers.

answered Jun 09 '13 at 04:16

Daniel Lyons

22,421
2
50
77

1

1. tell/told is 30 years old legacy-style that often leads to problems in case of errors. Think of (something like) listing failing in between. 2. In many systsems using listing that way can lead to code injection. – false Jun 09 '13 at 16:14
1. I like Edinburgh I/O, it's perfectly fine for a mere sketch, which is all this is. 2. If you read arbitrary terms and then execute them, it's a code injection with or without `listing/1`. Give me an example where `listing/1` intrinsically gives rise to code injection, and I'll be interested, but if it's the usual "don't execute exactly what you're given" there's nothing special that deserves a remark. – Daniel Lyons Jun 10 '13 at 06:11
ad 1: You write: "You can easily write data migrations on top of this", however, this kind of I/O leads to unintentionally mixing the text output and other forms of corrupt but hard-to-detect errors. Clearly not a good idea in case of a migration. ad 2: Please show on SO a corresponding discussion about injection - say for SQL. Just to see how this is carried out on SO. – false Jun 10 '13 at 10:11
You can change the I/O to ISO without affecting the meaning. With SQL injection, the problem is not storing user data but interpretting it unsafely. This is always true when dealing with user input, so unless there is a special problem with `listing/1` I don't find the objection noteworthy. – Daniel Lyons Jun 10 '13 at 13:36

Erik Kaplun · Answer 7 · 2022-03-22T12:52:47.640

I haven't personally tried this approach out yet but found a simple example/introduction. Turtle is used for a concise RDF data definition, and then the pure Prolog rdf/3 predicate is used to query the RDF data without using any external query languages, relying on Prolog's backgtracking only:

@prefix ex: <http://example.org/ns#>
<ex:user1>
  ex:name "Annie"
  ex:email "annie@example.com"
<ex:drawing1>
  ex:title "Railroad Car"
  ex:author <ex:user1>

Finding the author name:

drawing_author_name(Drawing, Name):-
  rdf(Drawing, ex:author, Author),
  rdf(Author, ex:name, Name).

This could also be applied in pure Prolog, without RDF:

entity(user1). % optional
name(user1, "John").
email(user1, "john@gmail.com").

entity(drawing1). % optional
% ref() wrapper for distinguishing from plain atoms
author(drawing1, ref(user1)).
title(drawing1, "Railroad Bus").

drawing_author_name(Drawing, Name):-
  author(Drawing, ref(Author)),
  name(Author, Name).

Erik Kaplun · Answer 8 · 2022-03-28T21:19:59.183

Update:

my first response instructed the use of terminus_store_prolog to access Terminus’ data store directly, however, further research revealse that TerminusDB can be accessed also at a higher level from Prolog by using its core modules (not client API but not raw store either). See the following forum post for a detailed explanation:

https://discuss.terminusdb.com/t/direct-access-to-terminus-store/291/2

———-

There is the collaborative graph database TerminusDB, which is written in Prolog. It seems to mainly advertise its JS and Python client libraries, but a closer look also reveals a Prolog bindings library to its data store:

Create a new directory (testdir in this example), then do the following:

open_directory_store("testdir", Store),
open_write(Store, Builder),
create_named_graph(Store, "sometestdb", DB),
nb_add_triple(Builder, "Subject", "Predicate", value("Object")),
nb_commit(Builder, Layer),
nb_set_head(DB, Layer).

Add a triple to an existing named graph

open_directory_store("testdir", Store),
open_named_graph(Store, "sometestdb", DB),
open_write(DB, Builder),
nb_add_triple(Builder, "Subject2", "Predicate2", value("Object2")),
nb_commit(Builder, Layer),
nb_set_head(DB, Layer),

Query triples

open_directory_store("testdir", Store),
open_named_graph(Store, "sometestdb", DB),
head(DB, Layer),
triple(Layer, Subject, Predicate, Object).

Convert strings to ids and query by id

open_directory_store("testdir", Store),
open_named_graph(Store, "sometestdb", DB),
head(DB, Layer),
subject_id(Layer, "Subject", S_Id),
id_triple(Layer, S_Id, P_Id, O_Id),
predicate_id(Layer, Predicate, P_Id),
object_id(Layer, Object, O_Id).

Best practices for data management in Prolog

8 Answers8