reparrot: Notes from the Lorito Braindump

Last Thursday, allison, chromatic, dukeleto and I met to discuss the direction that Lorito was taking and to try and get as much as we could out of chromatic's head and into the wider world. As it turns out we came up with some significant changes in the design of Lorito as an interpreter, but I think they'll end up being quite beneficial once they solidify a bit. The following summary is a bit less warty and incomplete than the rough notes I nopasted to #parrot as soon as I'd typed them up after the meeting, but there are still a number of unanswered question. I'll recap these at the end.

Terminology

M0 - Lorito ops. Think of magic. M0 has no magic, i.e. no complex behaviors or subtleties. Higher levels are M1 (anything built from M0, e.g. PIR), M2 (nqp-rx and winxed) and M3 (Rakudo and Partcl).

Context is the new Interp

The biggest decision we made was that contexts would play most of the roles that the interpreter currently fills. They will contain all the mutable state needed by a running program. This includes the PC, registers, return PC, exception handler PC, exception payload and a pointer to the calling context. Some things such as bytecode segments and iglobals will still belong to the interp, but it will be going on a pretty severe diet for Lorito. The GC may or may not live in the interp. We'll flesh this out as we go.

Having an explicit PC also means that a dedicated goto op is no longer necessary in M0. Jumping around within (or between) a bytecode segments simply means that the PC is explicitly set to an address rather than automatically incremented. We can also allow the PC to escape into the system stack for ffi, though this idea hasn't been sanity-checked yet and may in fact be insane. This is all, of course, very low-level M0 stuff. Higher-level languages will have all of the proper control flow constructs.

It's important to realize that M0 is designed to be as powerful as C, just easier to analyze. If an attacker can get a context to execute arbitrary M0, that'll be sufficient to own a machine. Security will be present, but it will live above M0, e.g. M0 bytecode verification or modification of the current context.

Each context will also have its own REPR and HOW according to jnthn's 6model work. What this means is that we plan on using the MOP as the basis of our contexts. A context will have control over how it implements cloning and subclassing. This will give us numerous specialization possibilities. We can make contexts that only allow a restricted subset of operations for something like PL/Perl6 or a more static-oriented context for low-power embedded or mobile platforms. A context can decide that it will no longer allow itself to be subclassed or cloned, and there'll be no way to do so without circumventing the MOP. All security concerns need a great deal of thought and scrutiny, but I believe that this will give us a solid foundation to build on.

We will also take advantage of representation polymorphism to allow for different types based on differing storage constraints, e.g. compactness, speed, or compatibility with calling conventions.

The current context will be the first argument to each M0 op. We're now going with a fixed-length 4 argument op format. The context may be implicit or explicit, depending on what we can figure out. A fixed op width will go a very long way toward simplifying any code that needs to work with bytecode. It will be a most welcome change to get away from pbc and its variable-length (and occasionally variadic) ops. It'll be a joy to rip that code out. We need to make sure that this doesn't cause enough pain in other places to cancel out the benefit.

During the discussion, chromatic wondered out loud if there were a way to make contexts immutable. I'm not entirely sure what he meant, but I'm recording the question here to try to keep it from being forgotten.

With the context-based approach, on function invocation (or any CPS-based control flow changes), a clone of the context is created and given a pointer to its caller. When this happens, data from the calling context will be COW'd to the called context to avoid excessive memory usage.

One of my burning questions was how CPS could work in a low-level assembly language where there weren't any continuations or closures. The answer is that we'll fake it by using the context as a continuation. We can get at a context's guts by a few simple loads and derefs. I'm a little fuzzy on the details, but I can at least see how it's possible to do CPS in M0 with a bit of hand-waving.

I had originally intended to reformat all of my notes into a nice post, but it's already close to bed time and I'm only though the first point. The rest of my notes will have to wait for another day. Until then, here are some of the remaining unanswered questions:

What kind of data belong in the interp and what all do we need in the context? The answers are settling, but there's still some uncertainty.
Where does the GC live? Is it a separate context, part of the interp or something else?
Is manipulating the PC a reasonable primitive to build an ffi on top of?
What pain will be caused by fixed-argument ops? Is it a worthwhile trade-off?
How would an implicit context as the first argument to each op work?
Is it possible to have immutable contexts and to do so more efficiently than straightforward COW'd contexts?

2 comments:

Andrew JohnsonDecember 15, 2010 at 10:08 AM
The way I believe you're using the terms, the parent of a clone is the thing that was cloned. That's linguistically a bit questionable; if my parents were to clone me they would be the clone's parents, but if I were to clone myself I would be the clone's parent. You might want to make the relationships a little more explicit.

- Andrew (following Lorito with interest but no spare time)
Christoph OttoDecember 15, 2010 at 3:41 PM
Andrew, thanks for pointing that out. I was conflating the concept of parent/child with caller/callee. I've edited the post to try and clarify that I was talking about caller/callee relationships.

reparrot

Tuesday, December 14, 2010

Notes from the Lorito Braindump - Contexts

2 comments: