Code

I'm Not Mutable, I'm Partially Instantiated

Incomplete data structures challenge our notion of mutability

November 6, 2024

Chapter 15 from The Art of Prolog contains this dictionary implementation:

lookup(Key, dict(Key,X,Left,Right), Value) :-
    !,X=Value.
lookup(Key, dict(Keyl,X,Left,Right), Value) :-
    Key < Keyl, lookup(Key,Left,Value).
lookup(Key, dict(Keyl,X,Left,Right), Value) :-
    Key > Keyl, lookup(Key,Right,Value).

Believe it or not, these six lines of code implement a dictionary as an ordered, binary search tree. The rule lookup/3 can be used to both add to and get from the dictionary.

This query calls lookup/3 with a key, a variable, and a value. The variable is instantiated to a new dictionary containing the key-value pair and the variables _A and _B in the left and right branches:

?- lookup(123, Dict, "foo").
   Dict = dict(123,"foo",_A,_B).

This query includes a second clause which adds another key-value pair, which is placed in the right branch of the first pair:

?- lookup(123, Dict, "foo"),
   lookup(456, Dict, "bar").
   Dict = dict(123,"foo",_A,dict(456,"bar",_B,_C)).

Immutable?

Since values in logic programming are immutable - why does adding a key-value pair to the dictionary not need to copy and return a new version of the dictionary?

The answer is because it’s an incomplete data structure. The binary tree’s left and right branches are variables, whose values can be determined later. So the lookup rule traverses the tree until it finds a matching key, or it finds a variable and “unifies” that branch making its key and value match its arguments.

Instead of being mutable, the dictionary is partially-instantiated. Think of it as an unfinished sketch of a binary tree, where instead of nil, the leaves have a value of “TBD”.

Yet it’s immutable since once a concrete value is specified, it cannot be changed without making a copy.

A more commonly encountered incomplete data structure is the difference list. These are partially-instantiated linked-lists with a variable as its tail. Prepending values to the tail appends them to the list in constant time.

Refactoring the dictionary

To get a key’s value, we provide the value as a variable:

?- lookup(123, Dict, "foo"),
   lookup(123, Dict, Value).
   Dict = dict(123-"foo",_A,_B), Value = "foo".

Checking for existence is a little tricky though:

?- lookup(123, Dict, Value).
   Dict = dict(123-Value,_A,_B).

The query “succeeded” but the value didn’t exist, we got a variable back! We can use nonvar/1 to check that our variable contains a value

?- lookup(123, Dict, Value), nonvar(Value).
   false.

The Art of Prolog authors Sterling and Shapiro note some other limitations of the dictionary:

If the key is non-numeric, the predicates < and > must be generalized. The cut is necessary ... because of the nonlogical nature of comparison operators, which will give errors if keys are not instantiated.

— The Art of Prolog, Chapter 15.3,

But that was twenty years ago. Let’s refactor it!

lookup(Key, dict(Key1-Value1, Left, Right), Value) :-
  (   Key = Key1 ->
      Value = Value1
  ;   compare(<, Key, Key1) ->
      lookup(Key, Left, Value)
  ;   lookup(Key, Right, Value)
  ).

This version uses a single rule, with an if/then control structure to avoid leaving behind choicepoints, since we know each key in the dictionary is unique.

Instead of a numeric comparison, it uses compare/3 to sort any key types. This is a newer predicate which uses Prolog’s standard order so keys can be strings, atoms, numbers, it doesn’t matter.

We also store pairs as the pair type (Key-Value), instead of two separate values. This makes easy to serialize a dictionary into a list of pairs, which are sortable using the builtin keysort/2:

Extending our previous query:

?- lookup(123, Dict, "foo"),
   lookup(456, Dict, "bar"),
   to_list(Dict, Ls),
   keysort(Ls, Ms).
   Ls = [456-"bar",123-"foo"],
   Ms = [123-"foo",456-"bar"].

Finally, we can make our code clearer by wrapping lookup/3 with two new rules: add/3 and get/3.

The full implementation is on GitHub.

Tags: prolog data-structures difference-lists