Wednesday, July 20, 2011

When Interpreters Collide

Note: this post is about implementing an M0 interpreter in Perl and is more a lightly edited braindump than a polished presentation of a concept.

Recently some test failures in M0's test suite revealed that the prototype Perl interpreter had been sneaking some of its perl-nature into the implementation.  The M0 assembler had been storing all values as strings and the interpreter had been secretly using its perlishness to convert the number-like values into ints at runtime.  This doesn't work well for an M0 implementation because M0 needs to be very specific about the low-level behavior of an implementation and the way it treats registers.

Perl is not C, and the basic problem I'm running into is that Perl is not designed to operate at the low level that M0 (as it currently stands) requires.  M0 is all about bytes and assigning meaning to the value in a register by using a certain classes of ops on it.  Perl is much higher-level and doesn't even have a particularly strong distinction between strings and integer values.  If I want Perl to have string byte-oriented C-like semantics, it means that I'll be widely (ab)using the bytes pragma and pack/unpack.  This is doable, but it's also torturing Perl into implementing something even further from its intended use case than the current (and subtly-incorrect) M0 implementation already is.  sorear rightly freaked out when he looked at the M0 interp code, because it's doing something that Perl wasn't intended to do and something that Perl isn't particularly well-suited to.

Still, javascript has been used to emulate at least x86, 6502, Z80 and 5A22 and  with surprisingly reasonable performance.  Arguably that's also pretty far from javascript's intended use case, and still it works.  This many just be an issue of finding the least hacky way to do something inherently very hacky.

The alternative is to specify M0 to have flexible underlying semantics, but I don't know that it'd be either practical or advisable to go too far down this road.  It's worth giving some thought to making the M0 spec be minimally unnatural to implement in a high-level language, but M0 is by its nature a low-level beast.  Implementations are bound to reflect that to some

In the end, the best way forward will probably be to plow through the craziness of implementing a simplified CPU in Perl and look forward to building on chromatic's C implementation, where the intent of the implementation language is much closer to the aim of the project.

1 comment:

  1. Perl's a high level language, true, but I think it should be noted that the Perl6 specification intentionally addressed the limitations you mention, for example, by speccing native types and even going so far as to say certain things composed of native types will be structurally packed.

    Perl5's solution to this was to XS all the ugliness into C-ville. Unfortunately this resulted in wrapper modules that often were either too restrictive, or gigantic in comparison to the interfaces they were designed to access. With what is currently specced for Perl6, much of what is currently done with XS modules and guts magic could be done in the native language. The only challenge is getting a structured object to "point" at a live SHM area or hardware register bank. Then you can theoretically enjoy autoboxed OO goodness while working directly with explicitly structured data.

    IMO, as much as I love perl5 it has always fallen short when it comes to things like dissecting and manipulating structured network packets, interfacing directly with hardware MMIO regions/SHM segments, or storing things very efficiently when you know it matters for reasons of algorithmic complexity. It is pretty near unusable for many tasks as a result.

    The lack of the ability to interface gracefully with the "real world," where explicit structural format is prevalent, has been the primary reason for me NOT using Perl5 to do certain things on many occasions.

    Unfortunately, this "feature set" tends to run afoul of the occasional "pointers are dangerous" sensibilities of those who generally never lay eyes on bare metal, as well as not being very interesting to developers, so it has lagged behind in implementation.