ARGON
Documentation
Login

IRON is the data model upon which ARGON is built. It's the abstract model for data sent over MERCURY, and the underlying representation for CARBON knowledge bases, and the homoiconic representation for CHROME source code.

As well as the abstract model, a textual encoding is defined for humans to read and write; and compact binary encodings for storage and communication.

Model and Textual Encoding

I have a fine blog post describing the concepts and rationale behind the data model.

Integer numbers

Integers are written as a sequence of decimal digits, or the sequence 0x followed by a sequence of hexadecimal digits, or the sequence 0b followed by a sequence of binary digites; all optional prefixed with - to indicate that it's a negative number; or, alternatively, the special sequences #-inf for negative infinity or #+inf for positive infinity.

Rational numbers

Rationals are written in one of two forms: either a finite integer (as described above) followed immediately by a / then a positive finite integer in the same base as the first integer (no further 0x or 0b prefix is necessary or allowed) to represent an arbitrary fraction, or an integer (as described above) with a . character inserted somewhere within it (except at the very end), representing a radix point. The latter form may also be suffixed with an exponent written in the form of an ^ character followed by a further finite integer, in the same base as the first numeric component (no further 0x or 0b prefix is necessary or allowed). The value of the number is, in this case, multiplied by the base to the power of the exponent.

Inexact numbers

Inexact numbers are written either as an integer or rational, prefixed with a ~ symbol, indicating that there is some arbitrary uncertainty in the accuracy of the number. A future extension may allow specifying a precision.

Characters

Any glyph that can be represented in Unicode can be an IRON character. This explicitly does not include any invisible (except for whitespace) or "control" characters, and does include any sequence of combining characters with a final non-combining character. They can be written with the sequence #' followed by the character than a closing ', or a sequence of the form #ucs( followed by zero or more hexadecimal UCS codepoints for combining characters each followed by +, then the UCS codepoint for a non-combining character, then ).

Booleans

Booleans may be written #true or #false

Symbols

IRON symbols have a rich internal structure, and can be written using various shorthands. Fundamentally, an IRON symbol is a list of strings, representing a hierarchical path through the CARBON directory, giving symbols a global identity. These strings are known as "components" and can be any non-empty strings of IRON characters.

A symbol may be written in its full form as the components of the symbol, in order, each prefixed by a / character, terminating in whitespace; any characters in the components that would create ambiguities with other syntax being prefixed with a \.

For instance, /foo/bar/baz.

A symbol may also be written in abbreviated form, with reference to a "current namespace" defined in context, as a sequence of one or more path components separated by /, with characters that would cause problems prefixed with a \ as usual. Such a symbol's value is found by concatenating the current namespace symbol with the additional components contained in this representation.

For instance, bar/baz.

Or it can be written with reference to a previously declared namespace binding. This is written as a single symbol component known as the "prefix" (with all the usual escaping rules), followed by # then one or more additional symbol components, separated by / characters. The actual value of the symbol is the symbol bound to the prefix in the current context, with the remaining components appended to it.

For instance, C#object:title:language:.

The empty list

The empty list is written as ().

Nil

Nil, a value that explicitly represents the absence of a value, is written #nil.

Lists

A basic list can be written as (, zero or more whitespace characters, the value at the head of the list, one or more whitespace characters, ., zero or more whitespace characters, the rest of the list (which may not actually be a list), zero or more whitespace characters, then ).

As a shorthand, chains of lists ending in the empty list can be written as (, zero or more whitespace characters, then one or more values separated by one or more whitespace characters, then zero or more whitespace characters, then ).

And finally, chains of lists ending in any arbitrary value can be written as (, zero or more whitespace characters, then one or more values separated by one or more whitespace characters, then one or more whitespace characters, ., one or more whitespace characters, the rest of the final list, zero or more whitespace characters, then ).

Sets

A set (unordered collection of values without repetition) is written as |(, zero or more whitespace characters, zero or more values separated by one or more whitespace characters, zero or more whitespace characters, then )|

Maps

Maps are notionally sets of pairs, with the constraint that no two pairs in the set may share the first element. They are written as { followed by zero or more whitespace characters then zero or more elements, each written as a value, one or more whitespace characters, :, one or more whitespace characters, then another value. The elements are separated by one or more whitespace characters. Finally, there are zero or more trailing whitespace characters followed by }.

Arrays

An array is an ordered matrix of values, of any dimensionality.

A single-dimensional array is written as #<, zero or more whitespace characters, zero or more values separated by one or more whitespace characters, then zero or more whitespace characters followed by >.

A multi-dimensional array written as #[, followed by N, a finite integer greater than 1 denoting the number of dimensions of the array, then ]< followed by zero or more whitespace, a sequence of zero or more array slices of dimensionality N-1 separated by zero or more whitespace, zero or more additional whitespace, then >; each array slice is either (if of dimensionality one) a sequence of zero or more values surrounded by < and > with zero or more whitespace characters on either side and one or more whitespace characters between each - or if of dimensionality N which is more than one, a sequence of zero or more slices of dimensionality N-1 surrounded by < and > with zero or more whitespace characters on either side and zero or more whitespace characters between each.

For example, #<1 2 3>, #[2]<<1 2 3> <4 5 6>>.

Strings

A string is a shorthand for a one-dimensional array of characters. It's written as the sequence of characters, surrounded in " quotes, with any \ or " characters prefixed with a \.

Records

A record contains a symbol, known as the record's "type symbol", and one or more values known as "fields".

In the simplest case, where the final component of the type symbol contains no : characters, a record is written as [/tt> followed by the type symbol, one or more whitespace characters, the field values separated by one or more whitespace characters, then ].

For example, [+ 1 2 3] (where the type symbol references the default namespace), [</foo/bar/baz> 1 2 3] (using a full symbol), or [foo:bar/bam 1 2 3] (using a namespace prefix).

However, if the final component of the type symbol contains : characters, then it must be composed of one or more "field headers", those being sequences of characters that end in a :. The record is then written as a [, followed by the type symbol but truncated after the first field header, followed by one or more whitespace characters, followed by a field value, then (if there are any more field headers in the type symbol), each field header in turn with one or more whitespace characters on either side, followed by a corresponding value - then a final ].

For example, [if: foo then: bar else: baz], where the type symbol is the current default namespace with the component if:then:else appended; or [/argon/carbon/object: /example/foo title: "A nice name" language: "en"] for a record with a type symbol of /argon/carbon/object:title:language:.

If the final field header in a type symbol is just : on its own, then the final value must be written AFTER the closing ] - which has a : appended to it.

For example, [note: "This is bogus"]: 123 is a record with type symbol note:: and fields "This is bogus" and 123.

CAS references

A CAS reference is a value that identifies an object in some other, unspecified, storage system; as the name implies, it is a content-addressable store, and the CAS reference is therefore some kind of hash of the object.

A CAS references is written #cas(, followed by a sequence of characters other than : (indicating the hash algorithm used), followed by a :, then a hexadecimal integer, then ).

Comments

As a bit of syntactic sugar, a sequence of one or more instances of a newline, zero or more whitespace characters, a ;, zero or more whitespace characters, any non-newline characters (known as "content") then a newline before any value causes that value to be wrapped in a record with type symbol /argon/iron/note::. The first field value is the result of concatenating the "content" strings with a single space character between each of them, and the second field value is the value found after the comment.

If a value is followed by a sequence of one or more instances of one or more whitespace characters, a ;, zero or more whitespace characters, zero or more non-newline characters (the "content") then a newline, then it too is wrapped in a record in the same manner.

For instance, the following examples:

  ; This is very good,
  ; I like it
  123

and:

123 ; This is very good,
    ; I like it

...both represent the value that could be written [</argon/iron/note:> "This is very good, I like it" ]: 123.

Namespace bindings

Any value can be prefixed with #!default-namespace, one or more whitespace characters, then a symbol which becomes the default namespace while the value is being parsed.

It can also be prefixed with #!namespace, one or more whitespace characters, a single symbol component called the prefix, one or more whitespace characters, then a symbol, which is then beound to the prefix in the context used to parse the value - shadowing any existing binding for that prefix in the current context.

Type constrained collections

Optionally, collections may constrain the types of their elements. The syntax for this is simply a # followed by a type definition (explained below), followed by a value of that type. However, as a shorthand, arrays can be type-constrained by writing the element type after the initial # - before the optional dimensionality declaration and the <. Note that it is illegal to give such a type prefix to a type other than list, set, map, or array.

Shared values

IRON values may contain other IRON values, but they are not constrained to a tree structure. Although cycles are forbidden, there can be multiple references to a single value, so IRON values can be an arbitrary directed acyclic graph.

Any value can be prefixed with #!def(, followed by a positive integer N, then ). This assigns the prefixed value with the identifier N from this point in the stream onwards.

Anywhere a value is expected, the syntax #!ref(, followed by a positive integer N, then ) can be used to represent a reference to the value previously assigned the identifier N.

Type definitions

Type definitions can be any of the following: