You might already see where this is going. When dynop libraries were loaded in a different order between compilation and loading, the dynop offsets within the interpreter's op tables would no longer be valid and hilarity would ensue. Also, by hilarity I mean segfaults. This could happen when a program which loads foo_ops followed by bar_ops was compiled to pbc. If that pbc were then loaded by a separate program which loaded bar_ops first, boom.
plobsing took it upon himself to fix Parrot so that dynop libraries could be loaded in any order without invalidating previously compiled bytecode. His solution was to do away with the per-interpreter op tables and move the op tables down into the excessively-capitalized PackFile_ByteCode struct. When ops are added by a bytecode segment, they're given entries in the op mapping table. The offset into those tables is what's stored in bytecode. The first op used will always get 0x00, the second will get 0x01, etc, no matter what the ops are. If you've been looking at pbc_dump's disassembly output, this is why the op numbers don't correlate with the numbers in src/ops/core_ops.c after the dynop_mapping merge. As part of the process of wrapping my head aroung plobsing's changes, I modified pbc_dump to output op mappings as well as the disassembled ops:
cotto:/usr/src/parrot $ ./pbc_dump -d hello.pbc
<snip>
BYTECODE_hello.pir => [ # 5 ops at offs 0x30
map #0 => [
oplib: "core_ops" version 2.9.1 (3 ops)
00000000 => 00000164 (say_sc)
00000001 => 00000022 (set_returns_pc)
00000002 => 0000001d (returncc)
]
0000: 00000000 00000000 say_sc
0002: 00000001 00000001 set_returns_pc
0004: 00000002 returncc
]
<snip>
Since I'm trying to reimplement the code in the PackFile PMCs, it was important to figure out how this code works at a low level so that non-imcc code can once again build a valid pbc file. For this to work, the PackFile PMCs need to be updated to do the same thing that imcc's pbc code does now. The first question, then, is what exactly the current code does. This breaks down into three stages: loading a packfile from a stream (usually a file), executing loaded bytecode and serializing bytecode to a stream.
Execution is the simplest change. In C, it means that code that deals with ops now needs to perform lookups on a packfile bytecode segment's op tables rather than on the interpreter's (now removed) global op tables. There are two important tables; op_info_table, which contains information on ops such as their names, family, arguments, etc; and op_func_table, which contains a list of pointers to the op functions. There's also save_func_table, which is used as temporary storage when something messes with op_func_table. These three pointers now live in the PackFile_ByteCode struct, so most code that deals with ops only needs to be changed as follows:
- op_info_t * const op_info = interp->op_info_table[*base_pc];
+ op_info_t * const op_info = interp->code->op_info_table[*base_pc];
The value of *pc will generally be lower, but that's an implemention detail.For storing and loading, the PackFile_ByteCode_OpMappingEntry, PackFile_ByteCode_OpMappping and PackFile_ByteCode structs (see https://github.com/parrot/parrot/blob/master/include/parrot/packfile.h#L235 ) are used. Because the bytecode segment (the PackFile_ByteCode struct) now contains op maps, the op maps need to be stored and loaded before the bytecode segment can be meaningfully used. An op map (PackFile_OpMapping) consists of an array of entries, with each entry contiaining all the mappings which use the same library. In the simple case where all ops are core, the op map will have only one entry, for the "core_ops" library. "core_ops" is the name for the ops that are built as part of Parrot and are always available. There will be another op mapping entry for each loaded dynop library such as "perl6_ops" or "math_ops".
The contents of an op mapping entry are minimal. The PackFile_OpMapping_Entry contains the name of the library (*lib), the number of ops (n_ops) and two arrays called lib_ops and table_ops. table_ops is an op's number according to the op mapping table and lib_ops is its number within an op library. When imcc needs to look up an op's number (using this function), it will ensure that the necessary library is loaded and perform a linear search through through all mapped ops and all loaded op libraries looking for an op with the correct function pointer. When it finds a previously unmapped op, it will add it to the entry for the right library and return its index.
This is a problem because a single packfile implementation isn't good enough. We actually have five. And by five, I mean two. The first implementation is the one that works and is implemented as C structs and functions. The second implementation is a PMC-based interface which is intended to allow PIR code to generate valid pbc. (It also allows the generation of wildly invalid pbc with hilarious results, but that's an unintended benefit.) The PMCs are what PIRATE uses to generate pbc that worked with Parrot before the dynop_mapping branch merged. The packfile PMCs are largely untested apart from PIRATE, so because the pbc format change didn't cause any new failures for those PMC, they were never updated.
The packfile PMCs are important because they're the future. imcc, which is our current PIR compiler, is widely disliked and has been used by at least one developer to frighten his children. imcc's code has several performance issues, poor maintainability and an undesirably low bus number. It's also tied into Parrot's internals much too tightly for anyone's good. Once PIRATE is ready, I want us to be able to rip out imcc and use the parrot executable as nothing more than a bytecode interpreter. In addition to decoupling imcc from Parrot, this will let us use more self-hosted tools and will help us work out how to make pbc manipulation more accessible to Parrot's external users.
Making the Packfile PMCs opmap-aware is an important step because it will mean that pir code will once again be able to produce valid pbc. From there, world domination is a smop.
* As always, Parrot is better off because people mentioned problems they ran into. There was some pain in the interim, but Parrot is more robust as a result of the reports we received
This is a great description of the current state of affairs. Please keep these detailed high-level summaries coming!
ReplyDelete