CARBON

CARBON is a universal global namespace, directory, and resolution mechanism, similar in scope to URIs.

CARBON also provides a generic way for an entity to expose information about itself, as well as for external "directory servers" to specify information about an entity.

It does this by having both directory entities and the actual entities themselves exposing a standard common interface for accessing a Prolog-esque knowledge base.

Instead of representing our terms in the Prolog form - symbol(value,value,...), such as loves(alaric,sarah) - we adopt a different approach: (value,value,...) and use CARBON global names to identify things unambiguously: (/c/gb/snell-pym/alaric /toolkit/concepts/loves /c/gb/snell-pym/sarah). This is more verbose, but we will introduce a shorthand notation later on, inspired by XML namespaces. The first item in the list is the subject, the second is the relationship, and the third onwards (if present) are the objects.

However, we still allow a knowledge base to contain both facts such as (/c/gb/snell-pym/alaric /toolkit/concepts/loves /c/gb/snell-pym/sarah) as well as inference rules, such as (?A? /toolkit/family/is-the-parent-of ?B?) if (?B? /toolkit/family/is-the-child-of ?A?).

When we are making statements about a CARBON name, the CARBON name is considered interchangeable with its value - and if that value is an EID (for the CARBON namespace can store things other than references to entities), then the EID is itself considered interchaneable with the entity. So by making a statement about a name, we can make a statement about an entity itself.

The protocol allows you to ask questions such as:

The answers will be zero or more matching terms, and optionally zero or more referrals to other EIDs that should be asked the same question.

Due to the name/value/entity equivelance, one could also phrase those questions in terms of the actual entity ID of /c/gb/snell-pym/alaric, which would be the ID of an IODINE user agent entity representing me to the system, and the results should be the same.

This protocol can be used in five ways. Well, it can be used in any number of ways, but we consider five for now.

You will notice some similarities to RDF here. What we add in particular are the ability to have terms with an arity other than three, and the idea of a standard resolution mechanism for arbitrary queries by making all symbolic references indexable into the global directory. Also, we specify a query protocol rather than a data format, allowing the directory nodes and entities to perform inferences to answer the query - this means we can write an entity that can answer questions about addition, by actually doing arithmetic.

The name space

The global CARBON name space divides the world of global knowledge into a tree structure.

The top levels of the name space will be provided by the root node entity cluster, maintained by yours truly. If ARGON takes off, we will form a special non-profit ARGON Foundation to look after the maintenance of the root nodes of the tree, including the useful /toolkit branch.

The structure of the top levels of the tree is:

/org/...
Global organisations. You lease a name under this node for an annual fee, like a top level domain name.
/c/...
Countries. Under this node can be found a node for each ISO country code. The metadata about the country itself should be provided by each local naming authority.
/c/*/...
Organisations within countries can register lease just within the country. How names under here are allocated is up to the local naming authority appointed for that country, in the long term, but in the short term, the ARGON Foundation will look after them until regional representatives can be appointed. Here can be found metadata about the country itself, as well as links to organisations within that country. There can also be regional names allocated within the country - eg, in the USA, one might reserve all the two-leter state codes and allocate them to state-level naming authorities; those in turn might be subdivided by city names, as well. Either way, organisations may lease names at any point in the hierarchy, depending on their scope. Each region node should have useful metadata about that region, too.
/region/...
Regions. Names under here are like "europe" or "asia". This offers a level between country-specific names and global names, where regional organisations can register names. Similar rules apply as for /c. The countries themselves are kept out of the regions to avoid big renames when they join or leave the EU. Regional naming authorities should provide useful metadata in the directory about their regions.
/toolkit/...
Under this name live all sorts of useful things. Generally abstract concepts such as love, that have neither an entity ID nor a value, but have lots of useful metadata. This is to be maintained by the ARGON Foundation.

Maintaining your own space

Note that as well as all this global stuff, the CARBON namespace also manages your own local storage. Any storage area entity (be it a whole volume, or just an entity representing a single user's space on a shared volume) will maintain a number of entities contained "within" it. As well as providing for administrative control over those entities, it will also contain metadata about them, which it reveals via CARBON. It may well let the user classify them into a tree structure of nodes, rather than just having them all at the "top level", thus giving them global names (relative to the global name assigned to the storage area entity itself).

However, as well as providing this tree structure, it will also let you store raw values, as well as arbitrary metadata about entities.

Local namespaces

As well as the global namespace, a given cluster, node, or context may provide local namespaces.

Local names are identified by the name of the namespace, a dollar sign, then the actual name.

The namespaces are "shortcuts" to locations within the global namespace.

For example, within an organisational workstation cluster for Warhead Ltd, the administrators may configure the cluster to resolve warhead$... by looking up the local name under /c/uk/warhead/internal, meaing one can refer to the nice colour printer as warhead$cprinter rather than /c/uk/warhead/internal/cprinter.

Also, a node that has access to one or more browsable local area network systems (Apple's Rendezvous, an SMB local neighbourhood, Bluetooth discovery, UPnP, etc) might provide the rendezvous$, netbios$, upnp$, and bluetooth$ namepsaces, as well as the generic lan$ namespace which searches all three. These would be implemented by the node entity providing a CARBON directory service listing the resources available to that node, with virtual entities (the node entity with a persona field) providing global gateway access to these local resources, and then the local name prefixes being registered on the node as aliases to the appropriate points within the global directory.

And a user may configure their user interface so that when they enter me$home, it looks in their bookmarks list (which will be available in the CARBON global namespace under their user agent entity's own subnodes, albeit probably with an ACL so that only the user agent or authorised proxies can view it) for the bookmark they have named home.

Any local-namespace name can be "expanded" into the full global name, when required. They exist as a user interface utility, and to allow software to easily find a list of "local" printers, say, as a starting point to allow the user to choose a printer.

Gateway namespaces

Gateway names are identified by a gateway name, then colon sign, then a gateway-specific name string. This is designed to allow compatability with Internet URIs.

These are part of the FLUORINE interoperability framework. These are special, in that the names are handled by passing the name to special resolver code for each namespaces.

The cluster configuration database contains a list of installed gateway namespaces, and the code to resolve them.

Browsing

A command-line interface will maintain a "current directory" node, which names without a prefixed slash or a local namespace name are resolved relative to, while names with a prefixed slash are resolved starting from the root and names with a local namespace are handled appropriately. If a relative name is not found in the "current directory", then a configurable search path may be brought into play.

As mentioned above, we can provide tab completion to aid in CLI use.

For graphical use, we can either provide a menu showing the tree structure (useful on mobile phone-scale displays with directional controllers), or provide a conventional "windows containing icons" representation of each node in the tree. In which case, we query the nodes for all of their children, and ask for the names of the children as well as an icon. Note also that we can ask the node for information about itself, such as a longer more descriptive title, or a link to documentation on the use of the node, as well as optional background imagery for the node window (in vector form), and other such styling information.

However, we could also provide a two-dimensional zooming interface. In this, the root node of the tree is represented as a region of space. Each child node of the root node is represented by a subregion of this; if the directory contains metadata about how the space should be split up then that is used to allocated the subregions, otherwise they are automatically assigned. If a subregion is small enough, then a thumbnail or icon is obtained for the directory metadata and drawn inside it (scaled to fit the subregion), but when it gets large enough to resolve detail, then the child nodes can be recursively displayed in their subregion. This way the tree can be navigated just by panning, zooming in, or zooming out. We can dispense with the need for windows; when a subregion allocated to a value needs to be displayed, just use the NEON rendering engine to display the value therein. If it's an entity, then contact it for its own NEON graphical front end and give it the subregion to work in.

Similarities and differences with RDF

I see that global names are always resolveable, like URLs. What about URNs?
Unlike RDF, in CARBON, we reason about arbitrary IRON values. So we can make statements about an object of type UUID, if we wish. We don't need to expand the namespace to handle this kind of thing, since we're not constrained to reasoning about names. We can make statements about numbers, strings, etc.
How about anonymous nodes?
We try to avoid them. In RDF, an anonymous node is usually still identified by an "inverse functional property" - some kind of primary key, which then serves to identify the object so that two anonymous nodes can be shown to be the same thing. I'd rather people just convert the IFP to an identifier for the anonymous node. For uses of anonymous nodes that do not have an IFP identifying them, then just pick a UUID and use that to identify them.
The equivelance between EIDs and entities can be explained by saying that the EID is just a pointer, and has no useful information about it that isn't just a property of the entity. And you can get away saying that the global name of an entity can be used to refer to the actual entity, because you can make statements about the name itself by making statements about a string containing the name. And you can give names to abstract concepts like "love" by creating a name that does not have a value or an entity ID, and then reasoning about the name. But can you really get away with reasoning about a person by making statements about the name or EID of their user agent?
Yes :-) Consider what a person looks like in CARBON:
(/c/gb/snell-pym/alaric /toolkit/people/is-named "Alaric Snell")
(/c/gb/snell-pym/alaric /toolkit/people/contact
    "mobile" #/toolkit/people/contact/telephone-number("+44..."))
(/c/gb/snell-pym/alaric /toolkit/directory/has-eid
    <EID of my user agent>)
Basically, the name refers to the person - and the person has an EID. But entities have EIDs, not people, surely? Ah, but the entity is the representation of that person in the ARGON... it's like a proxy to the actual person. This is a problem with RDF, since in the world of URIs, there are lots of URLs that are associated with the person but are very definitely arguably not the person; their home page, their blog, their SMTP mailbox, the URLs of RDF files making FOAF statements about them... one has to create an anonymous node that is linked to all those URLs to really capture the actual "person".

APIs

There are two interfaces to the CARBON client libraries; the low level one is an inference engine, which can operate either on an in-memory knowledge base, a persistent knowledge base (which is a TUNGSTEN storage class in itself).

Assertions are specified using patterns. Eg, one may state (/c/gb/snell-pym/alaric /toolkit/concepts/likes-to-eat ??) to state that Alaric likes to eat everything (which isn't true, by the way. I'm vegetarian.)

Inference rules may be specified in terms of recursive queries in the normal Prolog style, or by reference to arbitrary CHROME code. Eg, for an arithmetic module, one can specify that (?X? /toolkit/arith/is-sum-of ?Y? ?Z?) can be solved (if Y and Z are known) by summing them and binding X to that; or if X and Y are known, binding Z to Y-X; and so on.

Normal inference rules are simply specified as an assertion about an assertion:

((?PERSON? /toolkit/concepts/likes ?FOOD?) /toolkit/logic/is-implied-by
   (?PERSON? /toolkit/concepts/likes-to-eat ?FOOD?))

We can use this to express useful tools for expressing relationships between relationships:

(
	((
	  (?SUBJECT? ?PARENT? ?OBJECTS?...)
	  /toolkit/logic/is-implied-by
	  (?SUBJECT? ?CHILD? ?OBJECTS?...)
	)
	/toolkit/logic/is-implied-by
	(?CHILD? /toolkit/logic/is-a-specialisation-of ?PARENT?)

What this means is that if we write (/toolkit/concepts/likes-to-eat /toolkit/logic/is-a-specialisation-of /toolkit/concepts/like), then the system can assume that the statement (?SUBJECT? /toolkit/concepts/likes-to-eat ?OBJECT?) implies (?SUBJECT? /toolkit/concepts/likes ?OBJECT?). Therefore, if the system was asked what somebody likes, and it found the statement that the person in question likes to eat cheese, it would be able to deduce that the person likes cheese.

Computational inference rules are also specified using an assertion relating a template to a list of dictionaries. Each dictionary lists one or more of the variables in the pattern, and maps them to expressions that can be used to extract their values using the variables NOT bound in that dictionary. In other words, each dictionary in the list gives a number of the variables that can be computed, given the values of the other variables. For example, for addition:

((?A? /toolkit/arith/is-the-sum-of ?B? ?C?) /toolkit/logic/can-be-found-by
	[
	{ ?A? : #/toolkit/chrome/expression((+ ?B? ?C?)) },
	{ ?B? : #/toolkit/chrome/expression((- ?A? ?C?)) },
	{ ?C? : #/toolkit/chrome/expression((- ?A? ?B?)) }
	])

That states that ?A? can be found, given ?B? and ?C?, by addition; that ?B? can be found, given ?A? and ?C?, by substraction; and that ?C? can be found, given ?A? and ?B?, also by subtraction.

This can be used to access real-time data sources, too, since the inference engine carries a caller-supplied linear state variable, optionally supplied to embedded expressions, that can be used to encapsulate the World.

There is a special relation that suggests that a certain entity may be contacted via the CARBON protocol in order to try to resolve a query pattern:

(?PATTERN? /toolkit/logic/may-be-resolved-by ?ENTITY-ID?)

Then there are the high-level interfaces; one is the global name resolver, which uses the inference engine starting with the root directory node to resolve name lookups.

Finally, a trivial function will exist to generate a CARBON protocol server implementation for an entity, given a knowledge base to export.

Details of the inferencing engine

At any one time, the inferencing engine has various data sources it can examine.

To Do

I need to talk about caching. The CARBON protocol needs a way of specifying caching information on responses to queries. Then the knowledge bases need ways of saying that I don't mind if my name is cached for months, since it changes only a few times in my lifetime, while the current temperature on my CPU ought not to be cached for more then thirty seconds, tops.

This needs to supercede the old MERCURY caching mechanism - anything that could be cached in MERCURY really ought to be handled by exporting a CARBON knowledge base interface, now. I need to hunt down references to MERCURY caching and eliminate them, perhaps? CARBON adds a layer on top of MERCURY for idempotent data access, so MERCURY need not worry about it...

I need to fully define the CARBON protocol. As well as the query interface above, it needs a way that a client can open a connection to have notifications of changes streamed to it, since many applications demand that.

The result of a query via the CARBON protocol can include complete solutions (when the remote entity can produce a complete set of variable bindings to satisfy the query pattern), partial solutions (where it can supply some bindings for a potential solution, but there is still an unresolved query pattern that cannot be fully solved using the information available to that entity), and referral suggestions.

An example of where partial solutions are necessary would be the above example of liking foods. Imagine an empty knowledge base is asked:

(/c/gb/snell-pym/alaric /toolkit/concepts/likes ?WHAT?)

The inferencer will not find any matches in the empty knowledge base itself, obviously. It will start by sending the query to /c/gb/snell-pym, along with the query (/c/gb/snell-pym/alaric /toolkit/directory/has-eid ?EID?). The directory service for snell-pym probably won't know much about my likes, so will instead just respond to the second query with the fact that I have an EID.

My user agent entity will be able to respond with some information about things I like. It also happens to know (/c/gb/snell-pym/alaric /toolkit/concepts/likes-to-eat /toolkit/nouns/cheese), but does not immediately know that this is relevant, and is not required (when fulfilling a remote query) to go to the effort of looking up any possible relationship between that assertion and the query in question..

The inferencer will then ask /toolkit/concepts/likes what it can say about the matter. It happens to know that likes-to-eat is a specialisation of likes, but it does not have any information about what Alaric likes - so it will return a partial solution, with no variables bound, but the query (/c/gb/snell-pym/alaric /toolkit/concepts/likes-to-eat ?WHAT?), made by applying the inference rule to the original query. The inferencer will then attempt to solve the new partial solution to find more actual solutions, restarting the process from the top. This time, when it asks my user agent, it gets a bit - my US knows that I like cheese, thus producing a full solution: {?WHAT? : /toolkit/nouns/cheese}.

The inferencer outputs this solution, but keeps on looking. It will ask /toolkit/concepts/likes-to-eat if it can shed any further light on the matter, perhaps providing a referral to an online database of the culinary tastes of celebrities, or some more relevant specialisation properties.

When the inferencer has exhausted all its options, it will just return a (possibly empty!) list of full solutions. The list is streamed to the consumer like a coroutine; the inferencer does not actually look for the next solution in the list until the consumer actually re-invokes it to request the next solution, in case just one solution was enough.

This is in contrast to the more limited inferencer used to satisfy remote queries via the CARBON protocol. This is both in order to put the burden on the client making queries rather than the server providing information, and also because the server does not have sufficient information to fully use all of its inference rules. Quite often, an assertion held in the knowledge base being queried, when combined with an inference rule found on a remote server, will produce a solution to a query. However, the remote server cannot know what statements are in the original knowledge base, so the potential solution embodied by the inference rule must be passed back for further analysis.

I need to explain some uses cases, for people less familiar with logical knowledge representation and/or the Semantic Web ideals. Eg, a user interface browsing the directory may glean information from the directory nodes themselves, from the entity being referenced itself, and maybe also from a user-configured "ratings database" that provides third-party reviews of the service offered, or something like that.

Also, we should talk more about reasoning about relationships. /toolkit/concepts/loves may have general statements about the love relationship, such as declaring that it also implies a "likes" relationship, a link to a dictionary definition, declarations on the types of objects for which this relationship makes sense (like the way that in RDF, one defines a range and a domain for a relationship), and so on. Basically, as well as the core "resolution" system, consider how we can store information of use to knowledge base browsers and editors...