-1

I am implementing a (currently small) subset of a Prolog engine in Java. I am trying to speed up the unification of terms. Particularly, what concerns String comparisons.

My idea is to intern (by means of the String.intern() method) all the strings representing atoms, so two atoms can be considered equals using reference comparison (==) instead of invoking the String.equals() method.

After reading the answers to similar questions in this site (e.g., When should we use intern method of String on String constants) these are my conclusions:

  • I should gain on performance regarding atom comparisons.
  • Memory footprint should also be a bit lower using the intern method.
  • Atom instantiation may be slowed down a bit.

Is there a disadvantage I am not foreseen ? Or is there a better alternative to improve performance and reduce memory footprint in scenarios involving a big amount of data ?

Community
  • 1
  • 1
Sergio
  • 8,532
  • 11
  • 52
  • 94

2 Answers2

0

The problem with intern is that it only works for strings you have created, as other strings coming in you cannot guarantee will use the interned one. You can fix that by interning all the strings, but that is slow in of itself. You need to measure that impact to see if you are saving anything in the big picture.

A better approach may be to define objects to represent atoms and do your own mapping/caching of those objects to ensure they are correct for the comparisons you are doing.

Use a factory:

 public class AtomFactory {
      Map<String, Atom> atoms;
      public static Atom buildAtom(String atom) {
          return atoms.get(atom);
      }
 }

If you know your full list of atoms ahead of time you can prefil the mapping and change the method name to getAtom, if not you will need to add new entries to it as you go on and potentially will need to synchronize access for thread safety.

Tim B
  • 40,716
  • 16
  • 83
  • 128
  • I do have an object representing an Atom that receives a String in its constructor. My idea is to invoke intern() on such string so all the atoms will use the intern representation. – Sergio Dec 29 '13 at 08:54
  • Use a factory instead and only have one Atom object for each String. See edit. – Tim B Dec 29 '13 at 11:02
  • but why ensuring that an atom has a single representation should be faster than using interned strings? I mean, using internally a factory that relies on a map of atoms also has a performance penalty. Also, as I said in my previous comment, if I call to intern() in the Atom constructor I do not need to worry about how the string was instantiated, since I will always keep the intern string. – Sergio Dec 29 '13 at 20:05
0

Using interned strings for atoms looks perfectly appropriate.

However, for each atom, you probably have a unique Atom object anyway. So when you compare atoms at the Prolog level, you would only compare the Atom object reference, not the strings contained within. So you won't gain anything in atom comparison.

The places in a Prolog system where you have to find the Atom object, given the name string, are very limited: basically the read-predicates and the atom_codes/chars/string conversion primitives. These would benefit from having the string interned.

In your system, you will probably also have an object for every functor, i.e. Name/Arity pair. Some predicates, in particular functor/3, must find a functor from the corresponding atom and vice versa. Depending on how you organize your data structures, this may or may not involve comparing the string.

jschimpf
  • 4,904
  • 11
  • 24