Zyme - an evolvable language

Frequently asked questions

Can I download any source code or binaries?

What do you mean by a "biologically-inspired" or strand-based virtual machine?

What is the difference between the bytecode and source code formats?

What does fuzzy control flow mean?

Are there any projects that inspired the virtual machine and language?

How does conditional control flow work, I can’t see any in the demo?

Are you in academia?

Can I download any source code or binaries?

Currently, the only place you can access this language is the sandbox on the homepage.

If you are interested and would like to see more, especially if you research genetic programming, please reach out to hamish [at] waldzell [dot] xyz. I would love to hear from you.

What do you mean by a "biologically-inspired" or strand-based virtual machine?

The Zyme virtual machine operates via a unique automaton, whose model of computation - how it represents, manipulates, and transforms data - mirrors how biological systems process information. Specifically, it emulates how living cells interpret and transform molecular signals.

Proteins as molecular automata

Cells contain diverse biomolecules including DNA, RNA, metabolites, and crucially, proteins. These proteins are built from chains of amino acids whose specific sequence dictates their chemical properties. Many proteins are enzymes and catalyze specific reactions, including converting one type of molecule into another, or modifying another protein’s sequence.

Each protein can be viewed as tiny molecular automata running a micro-program encoded in its amino acid sequence. Substrate molecules binding serves as input data, while the catalytic products represent the output data. Like computers, these input-output processes are not static but can be dynamically controlled by additional molecular signals.

On their own, proteins have limited computational power and exhibit relatively simple behaviours. However, within a cell, proteins are organized into vast, coordinated networks, capable of producing complex behaviors. Collectively, these proteins - each following their own micro-program - operate as a single, unified automaton governing the cell's overall behavior, enabling it to respond intelligently to its environment and survive in unstable conditions.

The Zyme virtual machine adopts this ensemble-of-micro-programs model of computation.

Strand-based virtual machine

The digital counterpart to a protein in the Zyme virtual machine is a strand. Rather than composed of amino acids, strands are sequences of bytes. Like proteins, each strand operates as an autonomous automaton: its byte sequence is interpreted as a micro-program with individual bytes encoding machine instructions.

Because strands are simply byte sequences, they are both code and data; a strand micro-program consumes strands as input and produces strands as output. In fact, everything in the Zyme virtual machine is a strand - hence the term strand-based.

A complete Zyme program is an initial collection of strands, all interpreted as code and executed simultaneously. As these micro-programs run, they exchange strands and modify each other. These interactions facilitate complex computations — just as complex cellular behavior emerges from coordinated networks of proteins.

What is the difference between the bytecode and source code formats?

The Zyme virtual machine executes programs in the bytecode format: a binary representation that defines a collection of strands (sequence of machine instructions) whose execution determines the program's control flow and behavior. This bytecode format is robust to mutations, as any binary data constitutes a valid bytecode program.

However, the characteristics making the format amenable to evolutionary processes also render it impractical to write directly. Researchers still require the capability to create programs, specifically those to serve as starting points for genetic programming experiments. Poorly chosen initial programs can impede evolution: a program may be too small and lack complexity, rendering it overly fragile under mutation. Alternatively, an initial program might fail to produce any meaningful output for evaluation - for example, if a classification program should return a single byte representing a category but produces no output at all, there is nothing to assess and no way to discriminate between subsequent mutations.

To bridge this gap, Zyme provides a compiler that translates a human-readable source code to the bytecode format, enabling researchers to craft bespoke initial programs. This compiler maintains a direct correspondence between the source language instructions and the Zyme virtual machine's architecture; each expression translates to a single strand in the resulting binary output. The compiler also offers extensive macro capabilities, allowing users to define custom macros that assist in building complex multi-strand patterns, such as those to emulate call stacks and function calls.

What does fuzzy control flow mean?

All computer architectures include control flow instructions that enable dynamic behaviour - executing different instruction paths based on inputs to produce varied outputs - making useful computation possible. Consider if-else statements, where a boolean condition determines which of two code blocks executes. This requires two types of machine instructions: comparison instructions, which evaluate the boolean condition by comparing two values, and jump instructions, which transfer control to the selected code branch by jumping to a specified memory address.

Fuzzy control flow in Zyme is introduced by how the equivalent of jump instructions behave. Rather than jumping to fixed addresses, destinations are determined dynamically at runtime by finding the best match among available targets. Sometimes this is an exact match; other times execution transfers to an unexpected but sufficiently similar location. The outcome is that execution paths adapt to best fit the encoded instructions based on what's available at runtime.

Though incompatible with traditional architectures, this fuzzy control flow is a natural consequence of Zyme's biologically-inspired ensemble-of-microprograms architecture. As described in the earlier analogy, microprogram strands correspond to proteins, and the fuzzy jump behavior arises from mechanisms mimicking protein binding and unbinding - coordinating how strands interact.

Are there any projects that inspired the virtual machine and language?

Devine Lu Linvega’s uxn

After trying to evolve various dialects of LISP, I began to realise that conventional programming languages - those designed for humans - are inherently incompatible with genetic programming. We depend on well-designed languages to be predictable and understandable, as we struggle when there is a possibility that minor changes in code can result in subtle yet dramatic shifts in behaviour. Ironically, these difficult-to-comprehend unpredictable changes are essential for successful evolution. I sensed that a new evolution-oriented language might be necessary, but I was daunted by the prospect of creating one. I assumed creating a programming language of any kind was a task reserved for people far smarter than me.

But after stumbling upon uxn, a programming ecosystem focused on permacomputing created by Devine Lu Linvega, I was inspired. As I explored its concise implementation - the virtual machine is just ~250 lines of C - it dawned on me that programming languages and virtual machines were, at their core, just code like any other and nothing to be afraid of.

But uxn's impact went beyond technical insights. Linvega’s DIY ethos resonated: confronted with a problem, Linvega didn't wait for other peoples solutions - they created their own from scratch.

I knew exactly what I needed in a programming language for my evolutionary experiments. Why not build it myself?

Lee Spector’s Push

I am not the first to develop an evolution-oriented programming language. Instead, I believe that was Lee Spector, with his language, Push.

Prior to Push, existing programming languages had been adapted for use in evolutionary contexts such as genetic programming, however they had never - at least explicitly - questioned whether traditional computer architectures were fundamentally appropriate. Often, the programming language was just one component of a wider project and not the primary focus. For example, both Tierra and Avida, simulations of artificial life, contain virtual machines (which they call virtual CPUs) to control the virtual organisms. They stuck with traditional register-based architectures, and only later tweaked the design to include more unorthodox features.

Although researchers had acknowledged the specific and unique constraints posed by evolutionary computation, Spector was, in my view, the first to recognise that these challenges required programming languages specifically designed for them from the outset. This represented a fundamental shift in perspective, moving away from adapting human-oriented languages to developing evolution-oriented ones.

These constraints were particularly stringent in Spector's pursuit of 'autoconstructive' evolution, where programs themselves contain the mechanisms to control their own reproduction. This effort resulted in the creation of Push, initially a single programming language and later a family of related languages.

While I believe that Spector correctly recognized the need for evolution-oriented programming languages (even if he didn’t use this term), I don’t think he took the idea far enough. Push introduces a bold reimagining of a stack-based machine, with separate stacks for each data type, yet it is still derived from a human-oriented architecture. I believe that more radical, biologically-inspired designs would be even better suited for the task.

Douglass Hofstader’s Typogenetics

While contemplating potential virtual machine architectures, I was struck by a compelling analogy between proteins and computer programs. Just as a protein is a sequence of amino acids, which together define the protein's function; a computer program is a sequence of machine instructions, which together define the program's function.

Inspired, I began to sketch out a rough schema for an automaton based on strings of machine instructions that would - like proteins interacting - operate on one another. This concept would become the foundation for the Zyme virtual machine.

Suspecting that this form of automaton must have been explored before, I scoured the internet and academic journals. This hunch was confirmed when I discovered Typogenetics, a system described in Douglas Hofstadter's famous book "Gödel, Escher, Bach: An Eternal Golden Braid.".

Typogenetics is an automaton composed of sequences of machine instructions called strands that interact with one another with a complex system of ‘folding’ and ‘binding’ to modify one anothers sequences, and therefore their behaviour as programs. This mechanism was designed to emulate protein translation and folding.

Typogenetics influenced the design of Zyme, inspiring the adoption of strand as the term for the basic data structure in my virtual machine. However, the specific rules of Typogenetics and the instructions encoded by each element of a strand did not appear applicable to the evolutionary computation model I was envisioning.

Hofstadter had not imagined Typogenetics as a computer architecture, but as a mathematical game, akin to Conway's Game of Life. The primary puzzle in Typogenetics, similar to the Game of Life, is to discover interesting structures like self-reproducing strands. This framing led Hofstadter to design rules that were engaging for manual problem-solving with pen and paper, rather than a computational model.

While both systems share the core concept of strands as sequences of instructions, the overall architectures of Typogenetics and the Zyme virtual machine ended up diverging significantly.

Simon Hickenbotham’s Stringmol

Hickenbotham et al. identified the potential of Typogenetics (and strand-based computational models more generally) beyond mere puzzles. They recognized that a ‘program’ emerges from the mixing and reactions of sets of these sequences of machine instructions, which they call molecular microprograms. The result is a system called Stringmol, “an automata chemistry for experiments in molecular evolution and artificial life.”

Although artificial life and artificial intelligence (specifically in the context of genetic programming) both aim to evolve programs, the goals differ enough to lead to the distinct system designs. Specifically, Stringmol's focus on self-reproduction, open-ended evolution, and its commitment to faithfully representing real molecular biology - none of which are strong aims of Zyme - results in an interesting model of cellular biology but not a virtual machine.

Nevertheless, Hickenbotham et al. remain conscious of the implications for machine learning, as highlighted in a 2011 presentation where they stated, "Artificial Life is a bottom-up approach to Artificial Intelligence, and Artificial Chemistry is a bottom-up approach to building ALife." I would be intrigued to see how they might adapt Stringmol for applications in artificial intelligence.

How does conditional control flow work, I can’t see any in the demo?

Conditional control flow in Zyme is primarily achieved through the COND instruction. This instruction conditionally skips the next byte (and thus the next instruction) in a strand during evaluation. To fully explain how this works and its implications, a detailed explanation of the virtual machine's structure is necessary. Unfortunately, I haven't written that up yet, so you'll just have to trust me when I say that Zyme is Turing complete.

Are you in academia?

Yes, although it's largely separate from my work on Zyme and evolutionary artificial intelligence. My academic research applies evolutionary approaches to molecular genetics, with a focus on understanding RNA regulation during transcription.