reparrot: December 2010

Last Thursday, allison, chromatic, dukeleto and I met to discuss the direction that Lorito was taking and to try and get as much as we could out of chromatic's head and into the wider world. As it turns out we came up with some significant changes in the design of Lorito as an interpreter, but I think they'll end up being quite beneficial once they solidify a bit. The following summary is a bit less warty and incomplete than the rough notes I nopasted to #parrot as soon as I'd typed them up after the meeting, but there are still a number of unanswered question. I'll recap these at the end.

Terminology

M0 - Lorito ops. Think of magic. M0 has no magic, i.e. no complex behaviors or subtleties. Higher levels are M1 (anything built from M0, e.g. PIR), M2 (nqp-rx and winxed) and M3 (Rakudo and Partcl).

Context is the new Interp

The biggest decision we made was that contexts would play most of the roles that the interpreter currently fills. They will contain all the mutable state needed by a running program. This includes the PC, registers, return PC, exception handler PC, exception payload and a pointer to the calling context. Some things such as bytecode segments and iglobals will still belong to the interp, but it will be going on a pretty severe diet for Lorito. The GC may or may not live in the interp. We'll flesh this out as we go.

Having an explicit PC also means that a dedicated goto op is no longer necessary in M0. Jumping around within (or between) a bytecode segments simply means that the PC is explicitly set to an address rather than automatically incremented. We can also allow the PC to escape into the system stack for ffi, though this idea hasn't been sanity-checked yet and may in fact be insane. This is all, of course, very low-level M0 stuff. Higher-level languages will have all of the proper control flow constructs.

It's important to realize that M0 is designed to be as powerful as C, just easier to analyze. If an attacker can get a context to execute arbitrary M0, that'll be sufficient to own a machine. Security will be present, but it will live above M0, e.g. M0 bytecode verification or modification of the current context.

Each context will also have its own REPR and HOW according to jnthn's 6model work. What this means is that we plan on using the MOP as the basis of our contexts. A context will have control over how it implements cloning and subclassing. This will give us numerous specialization possibilities. We can make contexts that only allow a restricted subset of operations for something like PL/Perl6 or a more static-oriented context for low-power embedded or mobile platforms. A context can decide that it will no longer allow itself to be subclassed or cloned, and there'll be no way to do so without circumventing the MOP. All security concerns need a great deal of thought and scrutiny, but I believe that this will give us a solid foundation to build on.

We will also take advantage of representation polymorphism to allow for different types based on differing storage constraints, e.g. compactness, speed, or compatibility with calling conventions.

The current context will be the first argument to each M0 op. We're now going with a fixed-length 4 argument op format. The context may be implicit or explicit, depending on what we can figure out. A fixed op width will go a very long way toward simplifying any code that needs to work with bytecode. It will be a most welcome change to get away from pbc and its variable-length (and occasionally variadic) ops. It'll be a joy to rip that code out. We need to make sure that this doesn't cause enough pain in other places to cancel out the benefit.

During the discussion, chromatic wondered out loud if there were a way to make contexts immutable. I'm not entirely sure what he meant, but I'm recording the question here to try to keep it from being forgotten.

With the context-based approach, on function invocation (or any CPS-based control flow changes), a clone of the context is created and given a pointer to its caller. When this happens, data from the calling context will be COW'd to the called context to avoid excessive memory usage.

One of my burning questions was how CPS could work in a low-level assembly language where there weren't any continuations or closures. The answer is that we'll fake it by using the context as a continuation. We can get at a context's guts by a few simple loads and derefs. I'm a little fuzzy on the details, but I can at least see how it's possible to do CPS in M0 with a bit of hand-waving.

I had originally intended to reformat all of my notes into a nice post, but it's already close to bed time and I'm only though the first point. The rest of my notes will have to wait for another day. Until then, here are some of the remaining unanswered questions:

What kind of data belong in the interp and what all do we need in the context? The answers are settling, but there's still some uncertainty.
Where does the GC live? Is it a separate context, part of the interp or something else?
Is manipulating the PC a reasonable primitive to build an ffi on top of?
What pain will be caused by fixed-argument ops? Is it a worthwhile trade-off?
How would an implicit context as the first argument to each op work?
Is it possible to have immutable contexts and to do so more efficiently than straightforward COW'd contexts?

Parrot's roadmaps haven't historically been a great source of encouragement or accurate information. Our goals have often been overly optimistic with the result has been that most of the time spent dealing with our roadmap has been spent pushing back uncompleted tasks. The current system has been based on tickets attached to a specific version of Parrot which it was hoped would be completed by the time that version of Parrot rolled around. Sometimes the tasks had champions, sometimes not.

Unfortunately these tickets are often placeholders for ideas that are fully-formed only in the mind of one person. This prevents otherwise willing developers from jumping in and makes tasks hard to re-start after a break. There are also tasks that have received a good deal of attention but that simply haven't been completed. These tasks make the roadmap into a reminder of what we haven't accomplished rather than a list of our accomplishments and a source of encouragement.

Parrot's hackers have been hard at work making valuable contributions, but work has been largely independent of the current roadmap. It's always a challenge to keep an accurate roadmap in a project based on volunteer tuits, but whiteknight and I are sure that we can do better.

He and I chatted briefly on #parrot earlier this evening about how we want to structure Parrot's roadmap in the future. What we'd propose follows:

The roadmap will be based on major versions (essentially calendar years). Each year at the post-x.0 Parrot Developer's Summit, we will finalize the roadmap for that year. This roadmap will be wiki-based, since the wiki integrates nicely with Trac's ticket system but also allows a more flexible structuring of information. We will have a solid plan for the next year centered around the supported (.0, .3, .6 and .9) releases. The roadmap will list only major features which have a champion* and which we are confident we will be able to deliver. If we aren't confident of being able to deliver a feature in time for a supported release, it's better to have a release with no planned roadmap items than to have a pleasant fiction. We will also have a fuzzy plan for the following year, though it shouldn't be considered binding. Anything beyond two years will be planned only in a very general sense. We will maintain a wishlist for tasks which we want to undertake but don't have any dedicated volunteers, so that such features won't be lost or clog up the roadmap.

Parrot has an unfortunate history of over-promising and under-delivering. This has not helped our reputation among other OSS hackers and I want us to correct the trend. I want our new roadmaps to center around promising only what we're highly confident of being able to deliver. Establishing a track record will take time and effort, but two or three years from now I want to be able to look back with pride and say that we proved we could deliver what we promised.

*In this case, a champion means that this person is dedicated to seeing a feature to completion. "Owner" is another way of communicating the idea.

reparrot

Tuesday, December 14, 2010

Notes from the Lorito Braindump - Contexts

Monday, December 6, 2010

Roadmaps: Fact or Fiction?