Perl Closures As Objects
How lexical scope and anonymous functions can create powerful object systems
December 13, 2020
Perl’s object system is not one of its most admired qualities. Included in the 1993 Perl 5.0 release, objects were a bolt-on. A big improvement at the time, in today’s context the Perl 5 object system requires too much boilerplate and is under-powered compared to other language offerings (no private state, no type checking, no traits, no multimethods). Perl programmers have been trying to upgrade it for years (Cor is a recent example).
Combining a few concepts can lead to great power; 60 years ago in the LISP Programmer’s Manual John McCarthy showed how a Lisp interpreter could be created from simple parsing rules, a few types and just five (!) elementary functions.
Two things Perl 5 got right was its lexical scoping rules and support for anonymous functions (“lambdas”). Combine those features and you can make closures. And just what are closures good for? Well it turns out they’re pretty damn powerful; powerful enough, in fact to make a better object system than Perl’s built-in offering.
Private state
Perl objects are “blessed” data structures, which means data plus its package subroutines. Here’s a Point
class:
package Point;
sub new {
my ($class, $x, $y) = @_;
return bless { x => $x, y => $y }, $class;
}
sub x { $_[0]->{x} }
sub y { $_[0]->{y} }
sub to_string {
my $self = shift;
return sprintf 'x: %d, y: %d', $self->x, $self->y;
}
The new
subroutine is (by convention) the object constructor method. It accepts x y coordinates, and blesses a hashref of that data into a Point object. This associates all the subroutines in the package Point
with the object (x
, y
, to_string
and oops! it gets new
as well). As a Point object is just a hashref, any consuming code is able to modify the object data directly, even if no setter method was provided:
my $p = Point->new(3, 10);
$p->{x} = 5; # methods schmethods
Score one for convenience, strike one for (lack of) encapsulation. Here’s the same Point class, implemented using a closure:
package Point;
sub new {
my ($class, $x, $y) = @_;
my %methods = (
to_string => sub { "x: $x, y: $y" },
x => sub { $x },
y => sub { $y },
);
sub {
my $method_name = shift;
$methods{$method_name}->(__SUB__, @_);
};
}
In this case new
returns an anonymous function which performs the method resolution itself. Because the x and y coordinates are copied into the scope of the anonymous function, it has “closed over” the lexical environment and calling code has no way of altering those variables without using its public interface:
my $p = Point->new(1, 5);
say $p->('x'); # 1
say $p->('y'); # 5
say $p->('to_string'); # x: 1, y: 5
As its constructor does not provide any setter methods, its x y coordinates cannot change. It is immutable. The object also does not get its package subroutines, i.e. it has no new
method, which stays where it belongs, in the Point package.
Making it re-usable
So far so good. But what if I wanted to make other classes which work in the same way? If I have to copy-and-paste this pattern around, it’s not buying me much. Instead I’m going to introduce a new package which builds classes:
package Class::Lambda;
sub new_class {
my ($class_name, $properties, $methods) = @_;
my $class_methods = {
properties => sub { %$properties },
methods => sub { %$methods },
name => sub { $class_name },
new => sub {
my $class = $_[0];
my %self;
%self = (%$properties,
%{$_[1]},
self => sub {
my $method_name = shift;
$methods->{$method_name}->(\%self, @_);
});
$self{self};
},
};
my $class = sub {
my ($method_name) = shift;
$class_methods->{$method_name}->(__SUB__, @_);
};
$methods->{class} = sub { $class };
$class;
}
The new_class
subroutine takes a class name, a hashref of properties for object state (name and default value), and a hashref of methods (method name and anonymous subroutine). It returns a function object class, which uses the same method dispatch mechanism as before. I’ve omitted error checks for brevity.
The class objects have some useful methods for inspecting them: properties
returns the object properties and their default values, methods
returns the object methods, name
returns the class name, and new
creates a new instance of the class. It also injects a class
method into every object which returns itself (e.g. given a function object, you can call its class
method to get its class object). With these methods, our class objects have no need for Perl’s built-in object toolset of packages, bless, and UNIVERSAL.
The Point class compresses nicely:
my $class_point = Class::Lambda::new_class(
'Point',
{
x => undef,
y => undef,
},
{
x => sub { $_[0]->{x} },
y => sub { $_[0]->{y} },
});
One wrinkle here is the naive copying of constructor args into the object state. If the args themselves contain references, the caller could change the state of the references without using the object’s interface (assuming they retained a reference to the data). To prevent that the code could be updated to deep-copy any references that have a refcount greater than 1.
Inheritance
This wouldn’t be much of an object system if it didn’t support inheritance. I’ve extended the new_class
subroutine:
sub new_class {
my ($class_name, $properties, $methods, $superclass) = @_;
my $class_methods = {
superclass => sub { $superclass },
properties => sub { %$properties },
subclass => sub {
my ($superclass, $class_name, $properties, $methods) = @_;
$properties = { $superclass->('properties'), %$properties };
$methods = {
# prevent changes to subclass method changing the super
(map { ref $_ ? _clone_method($_) : $_ } $superclass->('methods')),
%$methods };
new_class($class_name, $properties, $methods, $superclass);
},
methods => sub { %$methods },
name => sub { $class_name },
new => sub {
my $class = $_[0];
my %self;
%self = (%$properties,
%{$_[1]},
self => sub {
my $method_name = shift;
$methods->{$method_name}->(\%self, @_);
});
$self{self};
},
};
my $class = sub {
my ($method_name) = shift;
$class_methods->{$method_name}->(__SUB__, @_);
};
$methods->{class} = sub { $class };
$class;
}
sub _clone_method {
my $sub = shift;
sub { goto $sub };
}
It now accepts an optional superclass argument. I’ve also added two new methods to call on the class object: superclass
returns the superclass object and subclass
accepts similar arguments to new_class
and creates a new class built with the current class properties and methods and its arguments. Because it uses list-flattening to combine the key/value pairs of properties and methods, and because the superclass data is listed first, the subclass specification always override the superclass.
Superclass methods are copied using _clone_method
to prevent method re-definition also redefining the superclass method. For now I’ve accomplished this with goto; every subclass adds a new layer of indirection. This could be implemented in XS to avoid the indirection cost; Sub::Clone does this, but it doesn’t work on v5.18 or higher (I guess the Perl interpreter internals changed and it needs an update).
Here’s a subclass of Point which in addition to storing x y coordinates, accepts a “z” value, to store a point in 3d coordinates. It overrides to_string
to include the new value:
my $class_point3d = $class_point->('subclass',
'Point3D',
{ z => undef },
{
to_string => sub { "x: $_[0]->{x}, y: $_[0]->{y}, z: $_[0]->{z}" },
z => sub { shift->{z} },
});
Traits
Single inheritance is quite limited; I could add support for multiple inheritance by accepting an arrayref of superclasses, and making method resolution more sophisticated. Instead I’m going to support traits which avoid the complexity of multiple inheritance and allow class behavior to be extended in a more flexible way:
First I’ll add support for creating new traits:
sub new_trait {
my ($trait_name, $methods, $requires) = @_;
my $trait_methods = {
requires => sub { @$requires },
methods => sub { %$methods },
name => sub { $trait_name },
};
sub {
my $method_name = shift;
$trait_methods->{$method_name}->();
};
}
This is implemented as the (what should be familiar by now) function object pattern. Every trait object has 3 methods: requires
returns a list of required method names, methods
key/value pairs of method names and anonymous subroutines, and name
to return the trait’s name.
Classes can be composed with traits using the compose
method, which looks like this:
sub new_class {
my ($class_name, $properties, $methods, $superclass) = @_;
my $traits = [];
my $class_methods = {
...
compose => sub {
my ($class, @traits) = @_;
for my $t (@traits) {
next if $class->('does', $t->('name'));
my @missing = grep { !$methods->{$_} } $t->('requires');
die sprintf('Cannot compose %s as %s is missing: %s',
$t->('name'), $class_name, join ',', @missing) if @missing;
my %trait_methods = $t->('methods');
for my $m (keys %trait_methods) {
next if exists $methods->{$m}; # clashing methods are excluded
# prevent changes to composed class method changing the trait
$methods->{$m} = _clone_method($trait_methods{$m});
}
push @$traits, $t;
}
},
traits => sub { @$traits },
does => sub {
my ($class, $trait_name) = @_;
grep { $trait_name eq $_->('name') } @$traits;
},
...
This isn’t a precise implementation of traits; in the original paper traits are not given access to the state of the object (except via its methods). That would require storing trait methods in a separate hashref, not passing the object state as an argument when the methods are called, and updating method dispatch to include searching the object’s trait methods hashref.
Metamethods
Whereas methods are concerned with object state, metamethods deal with object structure. Because function objects control their method dispatch, it’s trivial to modify dispatch to support metamethods like before
and after
which run code before or after a method is called:
sub new_class {
my ($class_name, $properties, $methods, $superclass) = @_;
my $traits = [];
my $class_methods = {
...
before => sub {
my ($class, $method_name, $sub) = @_;
my $original_method = $methods->{$method_name};
$methods->{$method_name} = sub {
my $self = shift;
my @args = $sub->($self, @_);
$original_method->($self, @args);
>}},
after => sub {
my ($class, $method_name, $sub) = @_;
my $original_method = $methods->{$method_name};
$methods->{$method_name} = sub {
my @results = $original_method->(@_);
$sub->($_[0], @results);
>}},
...
Whilst this works, it feels like the code is starting to get unwieldy. What I really need is a Metaobject Protocol. Instead of defining methods in a hashref of anonymous functions, I could have a “make_method” metamethod, which registers a new method in a class. Method registration would provide the opportunity to do things like multiple-dispatch; that is, a class could have several methods with the same name, dispatched to based on the arguments received at runtime (aka multimethods). This is one way of solving the Expression Problem.
Speed
By this point you might be wondering how fast function objects are; I ran some benchmarks to compare built-in OO, Moose and Class::Lambda objects. These show that function objects are at least in the ballpark of acceptable performance for construction, get and set methods. Once you add type constraints, error checking and deep-copies of arguments (Moose deep-copies its args), I don’t think these differences would matter in most cases. For example if I add isa => 'Int'
to the Moose Point class’s x property, its setter benchmark is ~4x slower.
Rate moose-new lambda-new builtin-new
moose-new 714757/s -- -13% -64%
lambda-new 817247/s 14% -- -59%
builtin-new 2012803/s 182% 146% --
Rate lambda-get moose-get builtin-get
lambda-get 5804047/s -- -46% -61%
moose-get 10789651/s 86% -- -27%
builtin-get 14813317/s 155% 37% --
Rate lambda-set builtin-set moose-set
lambda-set 4272114/s -- -46% -46%
builtin-set 7855213/s 84% -- -0%
moose-set 7886981/s 85% 0% --
Point function objects did use about 3x more memory than their built-in and Moose style equivalents on my computer: 812 bytes compared to 266 bytes (I was surprised to find simple Moose objects are as memory efficient as built-in ones). This is because function objects carry around more data, but also because closures require more Perl internal data structures. I could save memory by not copying every class instance method as a key/value pair into every object, and resolve method calls with a recursive search of the object’s class hierarchy instead. This trades memory for speed though.
Future
I’ve uploaded this proof-of-concept to GitHub. If you’re interested in learning more about metaobjects, The Art of the Metaobject Protocol is the definitive reference. For what it’s worth, Moose is Metaobject Protocol aware, battle-tested and remains the classiest (har) object system available for Perl today.
Perl’s evolution into a kitchen-sink of capabilities provides many tools: some powerful, some mediocre. The question is, where do we go from here? I’m not convinced “more OO” is the right direction for Perl; the language is already huge, the interpreter a byzantine labyrinth of C macros, and Ruby cornered the market for expressive, object-oriented dynamic languages long-ago.
One way to fight the bloat would be to distill the role of the Perl interpreter down to fewer, more powerful ideas. Objects are more powerful than subroutines, and a Metaobject Protocol more profound still. Yet beneath that, lexical scoping and a thoughtful type system could power them allĀ¹.
References
- Doug Hoyte writes in Let Over Lambda: “Let and lambda are fundamental; objects and classes are derivatives.”
This article was originally posted on Perl.com.
Tags: closure lisp moose metaobject-protocol object-oriented programming-languages perl