Notice: This essay was initially published below the demo on the homepage, intended to provide an overview of Zyme’s unique features and a brief progress update. As a result, any references to 'above' or similar terms point to the interactive demo.

12th September 24

Zyme in practice

While mutating the “Hello Creatures.” program above to produce garbled output (or revealing the “Secret!” message) may illustrate the quirky features of Zyme, it is far from real-world machine learning. This example was selected as a familiar context to illustrate core features of the virtual machine, like its fuzzy matching motif system, rather than reflecting a practical application. Nevertheless, Zyme does include tools for creating sophisticated initial programs beyond this simple demonstration. Further, preliminary experiments indicate that mutating these programs can generate non-trivial variation in behaviour, indicating that, maybe, just maybe, Zyme could serve as a foundation for a novel machine learning approach.

Language features

The design of Zyme's language balances the capacity for abstraction with evolutionary potential. While abstraction and encapsulation help developers manage complexity by dividing programs into independent components and modules, this can conflict with the goal of being an evolvable language. A primary benefit of modularization — preventing unintended interactions between independent components — can impede evolution. These interactions aren't inherently harmful; rather, the risk lies in manipulating data structures in ways that breach abstraction boundaries, potentially leading to unpredictable or undefined behaviour.

Traditional programming languages view this undefined behaviour as dangerous. However, in the context of an evolvable language like Zyme, this unpredictability becomes a feature. It's precisely this potential for unexpected interactions that drives evolution in the system.

Zyme employs macros to create abstractions without restricting runtime interactions. These macros can group functionally related strands, allowing them to be treated as single units in the source code. During compilation, any explicit trace of the groups is lost as all strands are ultimately compiled to byte arrays. This ensures that when resulting bytecode undergoes mutation, nothing prevents potential interactions between strands, regardless of their original source code organisation.

Source 1 illustrates a macro that transforms a given sequence of instructions into a ‘function’ call by emulating a stack frame. Instead of executing instructions directly within a 'caller' strand, they are placed in a separate 'callee' strand, which is then invoked through the motif binding system. The outcome is that this macro abstracts over the motif system involved in managing multiple strands, making it transparent to the user at the point of use (shown in Source 2).

To elaborate, the first line in Source 1 defines the remote macro, which takes a single argument, block - the block of instructions to be called in a separate stack frame. The macro's body uses a nested lexical scope, denoted by { ... }, where new definitions are locally confined. Within this scope, the first statement defines address, which serves as the unique motif for strand binding. Rather than specifying it literally, % is used to generate a fresh motif: a new motif guaranteed to be unique by the compiler. This feature allows multiple calls to the macro, as each instance will contain a distinct motif locally bound to address.

Next a new strand is generated that will open a new binding site containing the address motif, using WAIT instruction as seen in the initial interactive demonstration. The block of instructions provided as the argument, block is then inserted into this strand, followed by the RTRN instruction. This will become the ‘callee’ strand.

The macro concludes with the emit keyword, specifying what remains in place of the lexical scope after macro evaluation. In this instance, it's a sequence of instructions that initiates the 'function' call using the CALL instruction on the address motif. This will become part of the ‘callee’ strand. Note, both the generated 'callee' strand and the instructions in the 'caller' strand now operate with the same shared motif.

When executed in the virtual machine, the CALL instruction causes the two strands to bind, temporarily shifting execution control from the 'caller' strand's call site to the 'callee' strand. Execution resumes where the 'callee' strand was waiting - after the address motif consumed by the WAIT instruction, at the start of the block argument passed to the macro. Once the block instructions complete, the RTRN instruction triggers strand unbinding, returning control to the 'caller' strand's waiting point: after the address motif consumed by the CALL instruction.

Try this macro by copying Source 1 into the initial interactive demonstration source code, and substituting the original trim_motif macro with Source 2. Note, in Source 2, the square brackets [ ... ] group instructions into a single argument for the remote macro. These changes do not change the initial program’s behaviour, however the resulting bytecode now contains an additional strand.

Despite targeting a virtual machine with an esoteric instruction set, the Zyme enables developers to create abstractions and write structured code, without compromising on evolutionary potential.

Source 1

let remote block => {
    let address => % ;
    WAIT address | block RTRN ;

    emit => CALL address | ;
};

Source 2

let trim_motif =>
    remote [ insert_empty |> |> ] ;

Machine learning and mutations

For Zyme to serve as a viable foundation for a machine learning approach, it must be possible to effectively evolve its bytecode. This means significantly improving program performance through iterative random mutations.

This approach to optimising or improving computer programs - genetic programming - is not new and has a rich literature. The process operates with a population of programs in a two-step cycle of mutation and selection. Traditionally, the first generation starts with a population of random programs, in this case I used a population constructed in the manner I described earlier. This process operates as follows:

  1. mutation - randomly mutate programs in the current population of programs to generate new candidate programs.
  2. selection - evaluate each new candidate program's performance by assessing its ability to produce desired outputs for given inputs, similar to supervised learning. Choose the top-performing programs to form the next generation's population.

The process repeats, and theoretically the performance of the programs should improve over successive generations.

But does this work in practice for Zyme?

In the loose sense, Zyme works; I can observe performance improvements over time as programs undergo mutation. While it is not hard to find mutations that improve the performance at a given task, this phenomenon isn't unique to Zyme; randomly adjusting parameters of any machine learning system or statistical model would likely exhibit some degree of improvement over time. The critical issue is whether mutations of Zyme bytecode can reliably produce non-trivial improvements, potentially yielding results competitive with established machine learning methods.

This is something I am actively exploring. I am currently testing Zyme on small-scale problems to assess its effectiveness and resolve unforeseen issues that emerged after the design phase. The results, while limited so far, are encouraging.

I have been exploring many different ways to mutate the bytecode - often referred to as genetic operators - from single point mutations to more elaborate cross over techniques. Interestingly, I have observed that programs quickly increase their resistance to mutations over the first few generations. In multiple trials, I've observed that a specific mutation regime (combining point mutations and crossover) initially results in only a ~3% survival rate of mutated programs. Where survival here means the program still outputs a correctly formatted return value, allowing for evaluation of program performance. However, in subsequent generations the survival rate can increase up to as high as ~60%.

It's tempting to attribute this phenomenon to bloat-induced statistical artifact. Bloat happens when programs expand uncontrollably, often incorporating 'dead-code' regions in the bytecode that don't provide any functionality or improve performance directly. This surplus code may dilute effects of mutation, and by potentially protecting the functional bytecode, result in an increase in mutation resistance.

While I've observed bloat in Zyme, I don’t think this is driving the increase in mutation resistance and survival rate. Surprisingly, as the survival rate of mutated programs increases, I'm also observing a significant increase in variation in performance among these programs at each generation.

In early generations, when survival rates are ~3%, most surviving mutations are synonymous – new programs typically maintain the same performance as their predecessors, with performance improvements being very rare. However, in later generations as survival rates climb, I observe a shift: fewer synonymous mutations and more non-fatal mutations that alter performance levels. This simultaneous increase in mutation resistance, survival rate, and performance variation seems counterintuitive if bloat were the primary driver?

I am not sure what is going on here, but this is one of many interesting early observations. My next steps are to start scaling up to see how Zyme performs on less trivial problems.