ARGON
Documentation
Login

WORK IN PROGRESS

While IRON is the model for individual particles of data flying about an ARGON system, CARBON is the larger-scale model of data en masse. CARBON is, at heart, defined in terms of IRON (a CARBON knowledge base can all be encoded in IRON), and CARBON is all about providing large-scale structure for bits or IRON data; but CARBON deals with issues of scale that IRON need not concern itself with.

Knowledge Bases

To a programmer, the main facility provide by CARBON is the knowledge base, or KB for short. A KB is a set of tuples, each of which make a statement of fact about something. The tuples are themselves just IRON records, but with some metadata attached - such as expiry timestamps for transient data, or bookkeeping information relating to merging disparate changes.

Tuples

A tuple expresses a relationship between some objects. Those objects might be IRON values, such as integers or strings of text; or they might be "distant" objects such as entities (represented by their name in the CARBON name space, which in turn is written as an IRON symbol) or "abstract" objects identified by names in the CARBON name space that don't have an entity ID associated.

Every tuple has a type, which is an IRON symbol (and, therefore, itself names a point in the CARBON name space), and is used to express the meaning of the tuple.

For instance, one might express a title of an entity like so:

#!namespace C /argon/carbon

[C#object: </example/foo>
   title: "A nice name"
   language: "en"]

This is a relationship (/argon/carbon/object:title:language:) which can bind a title to any object, in this case being used to bind a title to an entity.

To give an example of abstract objects, imagine building a Wikipedia-like database of useful facts about things at /wikipedia/. We might want to talk about love, which we might name /wikipedia/love. But there is no "love entity" that represents love in the world of ARGON; love isn't available as a software service (although you can rent a passable substitute, as they say...). But nonetheless, we might want to make some statements about love, such as providing a description of it:

#!namespace wp /wikipedia

[object: wp#love
   description: "Love is ..."
   language: "en"]

So what's special about /wikipedia/love that makes it abstract, and /example/foo that makes it refer to a concrete entity? Not much, really - they're both objects, it's just that /example/foo happens to have an EID associated with it so that you can poke at it with MERCURY to ask it to do things. The means by which an EID is associated with an object are explained below under "The Directory".

Rules

You might think that CARBON sounds a lot like a relational database, with a table called "descriptions" that has columns "Object", "Description" and "Language", and you wouldn't be far wrong.

However, the difference starts to become apparent when rules appear on the scene. Rules are themselves tuples, but tuples which allow tuples to be created on demand.

[C#rule: [individual: $X descends-from: $Y]
   if: [C#or
          [individual: $X child-of: $Y]
          [C#and
              [individual: $X child-of: $Z]
              [individual: $Z descends-from: $Y]]]]

Within a rule, symbols whose last component starts with a $ are considered variables. The above rule explains that somebody is a descendant of somebody else if they are their child, or a descendant of one of their children.

Given the above rule, if CARBON is asked if somebody descends from somebody, it will automatically follow the rule to see if it can find a chain of "child of" relationships joining them, and if so, it will report success.

You can also have rules without an if part, too; they create tuples without requiring any conditions to hold first.

[C#rule [loves alaric $X]]

Alaric loves everything!

Objects with no useful name

Sometimes you have an object that isn't a concrete value such as a number or a string, but which also doesn't have a human-assigned global name.

For instance, an e-commerce system will need a way to identify things like orders.

The thing to do in this case is to still identify the object with a symbol, as it is not a concrete value; but to make a symbol up. This can be done by making a symbol relative to a single assigned namespace chosen for the purpose (eg, /com/mycompany/orders), named with a meaningful prefix followed by a unique number (and WOLFRAM provides us with Lamport timestamps which are guaranteed unique within the cluster). So the end result might be symbols like /com/mycompany/orders/id-2827391.

Ideally, as orders are objects that one might perform operations upon (such as cancelling), the shop entity should give its orders entity IDs (as personae of itself, or by creating independent order entities) and assign the names to them under its own namespace.

Compound objects

Although anything can be described as a list of tuples making statements about it, that's not always the most efficient way of doing so.

For instance, a bitmapped image could be described pixel by pixel, like so:

[bitmap: /example/my-face
           x: 0 y: 0
           Y: 1.0 u: 0.3 v: 0.7]

[bitmap: /example/my-face
           x: 1 y: 0
           Y: 1.0 u: 0.3 v: 0.71]

...

However, that would occupy very many bytes per pixel.

Alternatively, we could represent the entire image as a set of IRON homogenous arrays (they'll be compressed more efficiently if we don't interleave the planes), one for each component of the pixel:

[bitmap: /example/my-face width: 640 height: 480
Y: #float[2]<<1.0 1.0 ...> ...>
u: #float[2]<<0.3 0.3 ...> ...>
v: #float[2]<<0.7 0.71 ...> ...>

But that leads to problems with concurrent updating. With the tuple-per-pixel model, two different handlers running in the same entity can update different parts of the image at once, and both sets of changes will survive; yet if a tuple update can only update the entire image in one go, then of any two updates, only one will survive. The other will be overwritten.

So what we do is to cheat. The CARBON implementation has a number of "compound object types" built into it, which are handled specially. If it sees a tuple declaring that an object is an array of floats:

[C#array: /example/my-face-Y
type: /argon/iron/types/float
size: (640 480)]

...then it will actually allocate space for 307,200 (640 times 480) floats in a two-dimensional array.

If it then sees tuples such as:

[C#array: /example/my-face-Y
is: 1.0 at: (0 0)]

...it will update the packed array rather than keeping the tuple.

We could thus represent our image like so:

[bitmap: /example/my-face width: 640 height: 480
Y: /example/my-face-Y
u: /example/my-face-u
v: /example/my-face-v]

[C#array: /example/my-face-Y
type: /argon/iron/types/float
size: (640 480)]
[C#array: /example/my-face-u
type: /argon/iron/types/float
size: (640 480)]
[C#array: /example/my-face-v
type: /argon/iron/types/float
size: (640 480)]

An array might have individual elements updated one by one, but the compound object handler also responds to tuples stating that a given range of array elements have a single value, or that a literal IRON array object represents the values of the array in a given range, for bulk updating.

The result is a lossless representation of the bitmap, something like a PNG file; we just store the image as arrays, and rely on IRON to know tricks for compactly representing numeric arrays. Does that mean that all images need to be stored losslessly under ARGON, though? What about the improved compression of a JPEG file? IRON will always store arrays as honestly as it can, but we can still do lossy compression. If one follows the example of JPEG and breaks each plane of an image into 8x8 cells, then performs a discrete cosine transform on each and quantises them differentially to reserve more bits for more important image components, you end up with an array of 8x8 arrays of small integers for each plane.

We can then represent each plane as a vector made by taking the (0,0) element from each sub-array in turn, then the (0,1), then the (1,0), and so on in a JPEG-esque serpentine transposition that will tend to move all the significant information to the start of the vector, and make the trailing end of the vector largely zero. IRON will then have little difficulty in making a good job of losslessly compressing the result, thanks to the lossy encoding of the image.

The reason that lossy compression appears "above" the abstraction layer of CARBON is that lossy compression exists in the problem domain - it changes the data represented in ways that users need to, to some extent, be aware of.

Compound objects allow for compact representation of things that would be messy with tuples; homogenous vectors are the best examples of those. But we also have compound objects that help with parallel updates. For instance, we might be interested in counting how many times some event has occurred. The naive solution is to have a tuple containing the count:

[event: /example/foo count: 57]

When the event happens again, we look at the current count, add one, and tell CARBON:

[C#not [event: /example/foo count: 57]]
[event: /example/foo count: 58]

The C#not tells CARBON to forget the old tuple, and then a new tuple is provided with the new count.

However, if we have two such updates occuring in parallel, both of them might read 57 as the current value - and both would then write back 58 as the new count. The event has happened twice, but we've only gone up by one. And we won't even have a clue, as CARBON will not store [event: </example/foo> count: 58] twice - it merges identical tuples.

However, if we declare:

[C#counter: /example/foo-counter]

Then CARBON will treat that as an event counter object, storing a numeric counter and a buffer for pending events, initialised to zero and empty.

We can then say:

[C#counter: /example/foo-counter
  event: ...some unique ID...]

If the knowledge base is not replicated, or it is but all the replicas are currently accessible, it will atomically increment the counter.

Buf it we are using a replicated knowledge base with inaccessible replicas, so it cannot atomically increment the counter, it will store the unique ID in the pending event buffer. When all the replicas get together again, they can merge their event buffers (removing duplicates) and atomically increment the counter by the size of the resulting set.

When CARBON is asked to satisfy a query of the form:

[C#counter: /example/foo-counter
  count: $X]

...it will add up the counter and the size of the event buffer it has to produce a result.

And the following tuple will hold if the event buffer is empty and all replicas are reachable:

[C#counter-is-synchronized /example/foo-counter]

Temporary KBs

Temporary knowledge bases can be created at will, and are stored purely in RAM. They are just IRON objects with an opaque internal structure.

Tuples are stored within them, and transparently mapped to compound objects where applicable.

Perhaps the most interesting thing about them is that they can "chain" onto other knowledge bases (of any type), which will be consulted to satisfy queries along with the main knowledge base. In the event of any conflict, the main knowledge base has priority, and the chained knowledge bases are listed in priority order when configured into the knowledge base.

Conflicts are detected by consulting the metadata attached to tuple type symbols, which provide rules about what other tuples conflict with tuples using that type symbol.

TODO: Re-read that book on updating logical databases and explain how to handle this!

Serialised KBs

A bunch of tuples (including rules), and/or references to other serialised KBs stored in content-addressible storage and referenced by hashes, and/or existing serialised KBs, can be combined into a serialised KB.

When created, it may optionally be encapsulated by zero or more nested layers of cryptographic signing or encryption. Existing serialised KBs may also be wrapped in additional layers.

An encryption wrapper works by choosing a securely random session key and encrypting the contents, then storing copies of the session key encrypted for any number of public keys, pre-shared keys, or secret-shared key shard systems, so that any of the supplied keys (or combinations of shards) can be used to decrypt it.

Serialised KBs can be queried like any other (as long as sufficient keys are provided to undo any layers of encryption), and their signatures can be inspected. Signatures can be checked and inspected even if they contain encryption wrappers that we do not have sufficient keys for, as the signature is computed against the encrypted data stream. Successful decryption may reveal additional signatures that are contained within the encryption layer.

As a serialised KB may contain nested serialised KBs which might be separately signed or encrypted, it's possible that only partial information may be exposed unless additional decryption keys are provided. Signatures around nested serialised KBs are not listed as signatures of the whole "outer" KB, as that would imply they attest to the validity of tuples not signed by them, but it is possible to explicitly traverse the hierarchical structure of nested KBs and wrappers to find them.

TUNGSTEN mutable sections: Persistent KBs

Persistent storage of entity state in TUNGSTEN is based on the CARBON model, and uses the low-level B-Tree storage management of TUNGSTEN to present a number of knowledge bases, each corresponding to a "slice" of the entity's TUNGSTEN storage, in close cooperation with WOLFRAM. WOLFRAM will provide CARBON with tuples specifying updates that need to be made, but they might not arrive "in order"; they will be tagged with Lamport timestamps indicating the order in which they should be applied. As such, CARBON needs to store metadata alongside the tuples in TUNGSTEN, indicating what the last update timestamp was, so it can ignore updates with an earlier timestamp.

Compound object handlers get to take over the TUNGSTEN storage of their objects, so they can use appropriately compact and updateable representations in terms of the B-Tree. They also need to store their own update timestamps for individually updateable elements of the compound object, so they can correctly maintain the current state.

CARBON on TUNGSTEN also pays attention to special metadata tuples within the slice, which can mark tuples as being temporary - either specifying an explicit drop timestamp after which that tuple should be deleted as soon as it's noticed, or a drop priority allowing TUNGSTEN to delete it if storage space is low (with lower priority tuples being dropped before higher priority tuples).

TUNGSTEN immutable sections: Snapshots of KBs

An immutable section may be created within an entity by atomically snapshotting a mutable KB into a serialised CARBON knowledge base, and storing it in the WOLFRAM content-addressible store.

These are implemented using WOLFRAM's content-addressible storage system. Note that signing KBs that are already CAS hashes just stores a small object containing the signature and the CAS reference. Encryption, however, changes the data rather than wrapping it, so creates a new object with a new hash.

Remote KBs via MERCURY

Every Entity ID (EID) points to a serialised CARBON knowledge base that can be accessed via MERCURY idempotent fetch operations.

OPEN QUESTION: Note that as the CARBON protocol opens up TUNGSTEN knowledge base sections, it needs to know an initial default namespace - so the CARBON name being used to access the entity must be supplied in requests. If there is none (eg, we are just working from a raw EID), then what do we do? Construct a nasty one from the raw EID? Or do we make having an initial default namespace optional, and create a magic anonymous per-EID namespace for symbols defined in a raw EID, so they can compare equal if from the same entity?

The MERCURY protocol for accessing CARBON data at an EID has three parts:

A "pattern" may be a tuple with $-variables in it, or a more general kind of pattern naming a symbol and requesting all tuples containing that symbol or its prefixes. As it may not always be practical to decide which rules might match the latter, general, kind, the response to a query containing such a pattern may include additional rules that the server merely thinks might match.

This distinction allows an entity to expose a downloadable package of knowledge, while declaring the presence of additional knowledge that has to be requested explicitly through gateways. How gateways work is not specified by the protocol or the representation of gateways, but the proposed implementation has two types of gateway, as discussed in the Developer View page.

Caching

A serialised CARBON knowledge base may contain cache-control metadata pertaining to the knowledge base as a whole, or referencing particular tuples within the knowledge base. The CARBON-over-MERCURY implementation should take note of these, caching the results of queries in the cluster's distributed cache, and avoid re-requesting knowledge bases whose answers are already available in the cache.

It could also be possible, if it seems worthwhile, to configure clusters to use a shared inter-cluster cache. In order to enable that to happen without exposing too many security risks, the CARBON-over-MERCURY protocol will sign all the serialised CARBON knowledge bases returned (with the entity's volume key) so the validity of results obtained from semi-trusted third parties can still be verified.

Peer-to-peer sharing

The CARBON-over-MERCURY protocol will (probably in some future versions, as it's not a critical feature) embed a capability similar to BitTorrent. When multiple concurrent downloads of the same serialised knowledge base are spotted by the cluster being downloaded from, it may (if the serialised form is large and popular enough to justify the overhead) opt to start suggesting to recipients that they fetch blocks from peers who already have that block.

In order to support that, the protocol for fetching particularly large objects should already include the concept of splitting the object into size-capped blocks, and offer a mechanism for requesters to optionally specify their willingness to take part in cooperative distribution when requesting an object. The server can then choose to keep track of cooperative requesters and what blocks have been sent to which, so it can then suggest to requesters a list of peer addresses that the requester might attempt to fetch the block from, reducing load on the server and its bandwidth and producing faster downloads.

This also makes it possible for a global content distribution network (CDN) service to operate; the cluster can be configured with the address of the CDN, and for a suitable fee the CDN will then accept requests to fetch a given download from the server and be available as a cooperative peer for the server to point requesters at.

Why's it so complex?

The CARBON-over-MERCURY protocol is fairly complex, and here's my justification for why.

On one end of the spectrum, I want it to be as fast as DNS for the common case of following the series of links that let one resolve a symbol into information about it. The basic request for information about a name is a simple MERCURY "QUERY" protocol operation; as long as the request and response fit into an MTU, it can be handled as a single UDP packet in each direction, just like DNS. And bigger responses can be handled by performing an IRIDIUM connection handshake and then streaming the results block-by-block.

However, in the common case of public published data, those responses can be lifted direct from disk (or in-memory disk cache!) on any node in the cluster without needing to fire up a LITHIUM handler for the entity being asked. And those responses can be cached in the client cluster, meaning that any other requests from within the same cluster can be satisfied from the cache. And for very large responses, all the clusters that need it at the same time can cooperate in a peer-to-peer broadcast network to distribute it efficiently.

And where high demand is anticipated, you can pay the expense of setting up CDN servers around the world, which the latest static data is published to, and which clients are transparently directed to using the peer-to-peer protocol; they're basically configurable extra seeders, which new versions of the data are automatically sent to.

And in the less-common case where data isn't published in advance - it can gateway back to the parent entity to compute data on the fly, transparently to the end-user.

We're trying to cover a lot of cases here under a single unified interface. So it's a bit complex, but I think that's justified in the complexity it removes from elsewhere in the system.

The Directory

An IRON symbol is a list of strings, read as a path through a tree structure, with the root at the left and the identified symbol being the leaf, named as the final component in the list.

CARBON provides a means to resolve those names to find the ID of an entity responsible for the name.

Every cluster contains, in its configuration, a reference to a single global namespace root entity. This might be an actual EID pointing to a publicly accessible entity maintained by a suitably trusted global foundation tasked with maintaining the namespace, or it might point to a special proxy entity created in the cluster volume as part of the ARGON system, which acts as a gateway to a distributed blockchain-based name system provided by AURUM if I get around to designing such a thing.

Either way, this entity exists so that interested parties can query it via the CARBON-over-MERCURY protocol to resolve names. Names are defined in CARBON using tuples of the form:

[C#entity: $EID
   name: NAME-SYMBOL]

If we want to resolve a name /a/b/c, we query the root EID for any tuples mentioning /a/b/c (using a QUERY request with a general pattern specifying the symbol). It might know the full answer, but most likely it will only know that it has a tuple of the form:

[C#entity: ...some EID... name: /a

(...and maybe some other metadata about the

/a<verbatim> entity).

In which case, the client spots that it doesn't get the full name in the response, but does get an EID for a prefix of the name. So it sends the same request to the EID of <verbatim>/a
, and might get a more specific response:
[C#entity: ...some EID... name: /a/b

And if it sends the request to this new EID, it will hopefully get back all the information it has about

/a/b/c
- which may or may not include a C#entity:name: tuple pointing to the entity itself, which can then be asked for whatever information it has self-published about itself.

Note that it is not compulsory for every "node" in the namespace tree to be an entity.

/a
might publish information about
/a/b/c
itself; there might not be a
/a/b
entity or a
/a/b/c
entity at all, just
/a
publishing information about the symbol
/a/b/c
.

Note also that the name an entity is accessed as it used as the default namespace for relative symbols published by it. Therefore, if two different names point to the same EID, then symbols it refers to using the default namespace (as opposed to absolute names, or names made absolute with a declared namespace prefix) will be "bound" under both naming prefixes that point to this entity. Therefore it is recommended that when an entity refers to names "within" itself it uses default-namespace relative symbols, so the entity does not need to know its own name (which is good, as anybody can give that entity an extra name by just pointing a child of their own name at its EID), and those names will "work" wherever that entity is and however it is referenced. However, an entity is still free to make statements about objects named by absolute names.

Therefore, any entity can create an arbitrarily large subtree of objects within itself, using its own global name as a prefix, without needing to actually create entities; they can be purely informational objects, containing information but without any identity as an entity. Or the entity can attach EIDs to them that are actually just personae of its own EID; this is particularly useful for gateways to external systems, which can map the external information structure in a CARBON directory tree of objects, each of which appears as an entity acting as a gateway to behaviour that is mapped to the remote system. Or an entity may create actual entities as offspring of itself and then add them to a directory it exports, making them independent while still being children of itself in the CARBON tree.

Suggested structure for the top levels of the directory

/argon

Where ARGON system software is published from. This is delegated to a non-profit foundation (which may or may not be the same one as providing the root) which FIXME: Read these documents and gather together all the CARBON names described in them, and document them fully here.

/org

Where global non-profit organisations can register their own subtrees, for a small registration fee to the foundation and a small annual fee, to cover costs and to make sure failed organisations release their names back into the pool.

/com

Where global companies can register their own subtrees, for a less small registration fee and annual fee, but still easily affordable for startups.

/gov

Where international public bodies can register their own subtrees, for no fee but needing to prove their identity.

/me

Where anyone who doesn't want to tie their identity to a particular country can register a name, for a small cost-price setup fee and an annual renewal that costs nothing and just involves confirming continued usage (and which might only be required if no other evidence of continued usage can be found automatically). There is no restriction on registration under this prefix - corporations are welcome to, but /com looks better. Registrations are strictly anonymous under /me, with all that implies.

/<ISO two-letter country code>/com|gov|me

As above, but deliberately choosing to associated with a given geographical jurisdiction. The markup on registration and renewal fees, where it exists, is reduced to reflect the narrower scope and to encourage people to use local scopes where applicable.

/example

A name that will never be bound, except perhaps to an information marker, used purely for examples without fear of it ever clashing with anything.

/home

A name that will never be bound in the global directory, except perhaps to an informational marker, reserved as a place for a local override to be presented by user interfaces in order to display information relevant to the context the user interface is in - eg, resources inside the user agent itself, the user's own CARBON space under their user agent entity, links to the user's bookmarked CARBON names, and resources relating to the user interface device being used to interact with the user agent, such as auto-discovered resources on the local network, hardware devices attached to the user interface device, and resources administratively configured into the user interface service such as the nearest printer, information resources about the building containing the device or the organisation providing it, etc.

FLUORINE gateways

For convenience, we should give these top-level names.

Prior Art

The closest thing in the current state of the art is probably WebFinger, which offers a standardised way to request metadata about a URL.

RDF and OWL are attempts at building something similar, but they've seen limited uptake, for reasons I am prone to rambling on at length about (in summary, I think they're naive and too vague on important points, while also brimming with accidental complexity that makes them impenetrable).

Inference algorithm

Given a heap of CARBON tuples (be they directly accessible, or via MERCURY), we need an algorithm to actually answer questions.

Questions are in the form of tuples with "blanks" in, the algorithm needs to provide a list of sets of values that can fill the blank to create provably true (with reference to the knowledge base) tuples.

To make that concrete, we might ask loves alaric $X, and the answers might be ({$X:food} {$X:cats}).

This is simple if tuples matching the pattern can be found in the knowledge base! But the presence of rules complicate matters tremendously - especially as rule bodies can contain things like C#and and C#or - and, even more so, C#not.

We cannot make the unique name assumption (UNA); the same object might be referred to by different names in different contexts. We want to avoid that where possible, by having a good way to agree on names for things in the CARBON directory, but there may be statements that two objects are the same thing - C#object:isAliasOf: - and if the algorithm encounters those, it needs to treat all tuples about any objects in that "equivalence class" as being the same for inference purposes.

The closed world assumption (CWA) can't generally be made for normal CARBON inference. However, code may choose to request a closed-world assumption when querying a knowledge base known to be complete. So it needs to be an optional setting that can be turned on when required.

OPEN QUESTION: I'm not sure how much we want to enforce decidability (or semidecidability). General programming languages let you write programs that crash or never complete, because making that impossible makes it difficult or impossible to solve a wide range of interesting problems. Is CARBON inference that can decide all questions a useful enough subset of CARBON? Or do we want a more general CARBON that lets us express more stuff, but some queries will hit a runtime limit and give up without an answer? Will this open security implications - given that we can combine knowledge from multiple sources, not necessarily entirely trusted, will it be possible to introduce "poison" knowledge that makes queries unanswerable, as well as simply wrong information?

Given that we deduce in an open world most of the time, I presume we can't have completness - if we get no answers for a query, we might also get no answers for [C#not ...the same query...].

TODOs

Talk about the ability to override parts of the namespace on a cluster-wide scale, configuring a list of CARBON namespace roots, each with the EID of the entity to "splice" into the tree at that point. The actual root of the global CARBON directory is configured in at this point, by having a mapping for

/
.

Also talk about the subsequent ability to demand that a local copy of any given namespace subtree be kept in the cluster at all times, in effect overriding its original name but pointing to a snapshot stored within the cluster. Note that CHROME modules list their dependencies, which can be used to recursively local-copy them. This is used to ensure that critical resources are available "offline", and to configure the cluster to use a specific version of something rather than "the latest". This effectively overrides the remote cache-control headers with a local directive.

Such local copies do not change when upstream changes occur, but an administrator can view a list of newly available things and opt to upgrade, or downgrade when older versions are still available. This is like "installing software". A mechanism to keep the current version archived away somewhere for later downgrading, or even to fetch the current remote version direct into the archive to try later or offline, would be desirable. The user interface for this should probably take note of the /argon/iodine/document version control protocol.

The local copies are stored in one or more nominated WOLFRAM distributed caches, with an infinite replication factor and no expiry timestamp or drop priority so they are kept until otherwise specified. By default, they go to the cluster cache, so are replicated to every node.