HYDROGEN
HYDROGEN is not a part of the ARGON abstract specification - it's actually an implementation component. No ARGON user code will depend upon HYDROGEN, but the implementation of certain parts of ARGON itself will.
HYDROGEN is a hardware abstraction layer; the core abstraction of the underlying node, from which my first ARGON implementation will build. The node resources HYDROGEN must provide a portable interface to are:
- An address space. HYDROGEN provides a mechanism to request a pointer to a block of memory, to resize that block (possibly moving it to a new pointer), and to deallocate it. Pointer arithmetic may be performed. The address space may also contain regions of memeory not managed in dynamic blocks - code cannot rely on being able to resize or free a block unless it knows it has been dynamically allocated. The amount of memory available for allocation should be measurable.
- Processors. Any number of them. Each of them execute the same form of machine code, from the same address space. The state of the processor can be stored into a buffer in the address space, and another state loaded, to enable context switching. Whatever machine code the processors use, they will present an abstraction of a multiple-stack machine with a small set of specialised registers, one of which points to a "thread state block". Each fundamental type listed below may, or may not, have its own stack - or they may all be implemented on a single stack. In other words, code that has just pushed a value of some type onto "the stack" must consume that value with a word expecting that type on "the stack". That way, code will not trip up on implementation details.
- The dictionary. This is a linked list of words; words are blocks of machine code identified by a textual name, with some associated flags and other parameters (such as the execution time bounds mentioned below). The system will pre-fill the dictionary with words providing all of the facilities defined in this list.
- Arithmetic. The processors must provide (either natively or via emulation) a basic integer type in signed and unsigned versions, byte-sized signed and unsigned integers, double-length signed and unsigned integers, and the IEEE floating point types, a pointer type, and words to compare and perform arithmetic upon them, as well as words to find the size of each of these types in bytes (except for the byte type, which is self explanatory), and to load and store objects of each type given a pointer.
- ISO10646 (Unicode, to all but the trained eye) characters. And words to compare them, extract their properties from a Unicode database, load and store characters from pointers into the address space, and to find the size of a character in bytes.
- A stream abstraction to memory. A standard API can be used to write the aforementioned data types to a stream; behind the scenes, the stream system builds up a linked list of buffers which grows as more is written to the stream. When the stream is complete, the stream module will convert it to a single block of heap memory and return a pointer to its start, and its length. This facility mainly exists to support the next one...
- Code generation. HYDROGEN provides standard interfaces to generate code from a set of primitive stack-based operations, including flow control. Code is generated straight into memory, meaning the code generator can look at the capabilities of the attached physical processor to decide how to generate optimal code; it doesn't need to be relocatable, portable, machine code that can be saved to disk. Also, this means it can inline easily.
- An interpreter, which (given an input stream of some kind) will sit in a loop until the stream ends, fetching a single whitespace-delimited word from the input stream, locating it in the dictionary, and then (if the InterpreterCompilationStream pointer in the thread state block is set, but the located word DOES NOT have the IMMEDIATE flag set) compiling a call to that word to the given compilation stream; otherwise executing the word there and then. The interpreter must be careful about its use of the stacks, such that when two words are executed in turn, the items left on top of the stacks by the first word are still there when the second word executes; and the interpreter must be able to live with the fact that the words it executes may add and remove items from the stacks at will. There will be standard words for creating new words, which will set up a compilation stream and point the InterpreterCompilationStream pointer at it; subsequent code will then be compiled rather than executed, until the IMMEDIATE-flagged word that ends a word definition is encountered (and, due to its IMMEDIATE flag, executed immediately), which will compile a RETURN instruction, extract the finally generated code from the stream, then create a new word entry in the dictionary with the specified name and flags, pointing to the newly compiled code.
- A bootstrap image store. The bootstrap image is a single large string of source code to be fed to the interpreter, running on just one CPU while the others remain idle, when the HYDROGEN runtime system is first started; when it ceases executing, then all CPUs start executing a word called MAIN (which the bootstrap image should have defined). How this image is stored is implementation dependent; HYDROGEN on top of POSIX will store it in a file, while raw-hardware implementations might store it in FLASH ROM or a special section of disk set aside for it. There is a word available to install a new bootstrap image, too. It is recommended that there be an external way to update the boostrap image (with the POSIX implementation, this comes for free - just overwrite the file) or to roll back to a previous version, just in case an update causes the system to become unbootable.
- Asynchronous notifications; signals or interrupts or traps, call 'em what you like. In particular, a resettable timer interrupt is needed to implement preemptive multitasking, as well as notifications of the completion of I/O operations, and any other notifications provided by the particular hardware.
- Bounded execution time. Every primitive that can be compiled by the code generation library has an expected case and a worst case execution time, which can be obtained. Any operation that cannot be expected to be completed in a bounded time period is implemented asynchronously; an asynchronous notification handler is provided with the request, which is invoked when the request is completed. Platforms in which arbitrary delays may be uncontrollably inserted - such as when running on a non-real-time host OS - can, of course, declare their inability to make guarantees. This facility is only required for hard real time scheduling, which is an optional capability for an ARGON node.
- Low level locking with simple mutexes. The mutex claim operation needs an asynch notification handler to be provided; if the mutex is already unlocked it returns 'True' and ignores the handler. If not, it returns 'False', and queues the handler to be granted the lock (and notified of such) at a later date.
- Devices. A list of hardware devices is provided; each entry consists of a device name, a device type code, and an opaque device ID pointer. This list may change with time, too; asynchronous notifications will announce changes. Each type of device supported by an implementation will also have an associated library for accessing devices of that type (passing the device ID pointer as a parameter to all operations)
Particular devices that would be commonly found on the devices list are:
- A console. This isn't just a bidirectional stream of characters like a Unix console - this is designed to better handle multiple threads of execution spewing out debugging messages while you're trying to type. The console consists of a status area, which contains a number of system variables (each consisting of a name, a value, and a condition code (OK, WARNING, or ERROR); an output area, to which messages are appended, and an input area, which is inactive unless an input operation is in progress. When requesting an input operation, the layer above must provide a prompt string, and the address of an asynchronous notification handler to activate when the response string is ready. The console may be redirected in software, by attaching to the device driver and receiving the status area and output area updates as they come, and providing an input handler. It is expected that the hardware console itself should continue to operate when this happens, with inputs being satiated by whichever responds first, and with some hardware key that can be used to lock out the software console for security. It is also hoped that if the hardware console has an alerting siren or error lights or other such attention-gathering mechanisms, they can be set to trigger when a status area variable goes to the ERROR state, and stop when no variables are in that state or a 'shut up' button is pressed.
- Block I/O. One or more persistant storage devices organised into 512 byte blocks. I/O is, of course, asynchronous.
- Network interfaces. Something very like the BSD sockets API is available. Interestingly, serial links are also handled by this, as a special network which can only allow one type of connection (stream) to one other host, and even then only one connection at a time.
- Host OS services "devices". An implementation on top of POSIX would provide a filesystem device for access to the underlying file system, a syslog device, a device to execute native commands (with I/O redirection), a device to load and call native shared libraries, and so on. An implementation on top of Windows would provide many of those, but facility for accessing DLLs instead of shared libraries.
As well as the above, which are used by the ARGON core, would be any other devices on the machine - scanners, graphics terminals, etc. These are made accessible via the node entity through MERCURY adaption interfaces for each driver type.
The code of the ARGON core is to be written in terms of the HYDROGEN code generation primitives, so wherever HYDROGEN is ported, ARGON will be able to run.
HYDROGEN can be implemented highly portably as a bytecode interpreter written on top of POSIX; a number of POSIX threads could be initiated to act as the 'processors', and POSIX mutexes used for the locking facility. Also, for specific target architectures, versions that compile to native code can be produced.
An implementation of HYDROGEN on top of POSIX would be called argond.
STATUS
HYDROGEN is being implemented as a portable bytecode interpreter on top of POSIX as we speak. A couple of prototypes have been made to run, and a new implementation is now being put together incorporating the results of those experiments!