Security
There's no shortage of pundits bemoaning the poor security in current operation systems. When I get time, I'll start to fill this page with links to things like the report on the suggested improvements to the Multics operating system to bring its security up to what the authors considered an acceptable standard, recently revisited with the note that most modern OSes don't even have all the security features Multics had, let alone the improvements it needed beyond that.
This saddens me. So for ARGON I've designed what I hope is a good security model.
Security within the cluster
As recommended by the Department of Defence 'Orange Book' security guidelines, we provide both mandatory and discretionary access control.
Every entity is assigned a classification, from a list of classifications defined cluster-wide by a security administrator. One site may use "High" and "Low", another might use a complex classification hierarchy. The names of the classifications do not matter to the system (although they are of vital importance externally, due to the psychological connotations they carry for human users!), all that it cares about is that the classifications have names and are ordered, so that any two classifications are either the same, or one is higher than the other, with the usual rules of transitivity. Two classifications with the same name must be equal, but two classifications with different names may or may not be equal - it might be desirable to provide different classifications that are considered equivalent by the system, but carry subtle distinctions to humans.
Classifications are also awarded to volumes, and function as a minimal classification level for any entity within the volume. The actual classification level afforded to an entity is the highest of the classification of its volume and the classification actually assigned to that entity.
The rule of mandatory access control is that information can only normally flow up classification levels. Of course, downwards flow must also occur, but when it does occur, it must be closely controlled and authorised.
Therefore, each node must also be given a classification. Only entities of that classification or lower may be stored on that node. Since all entities in a volume are potentially stored on any node mirroring that volume, the system will not allow an entity to be assigned a higher classification level than the lowest classification of all the nodes mirroring the volume containing the entity.
The cluster's network topology is told to the system, as a tree of networks with nodes as the leaves. In other words, starting with a list of all the nodes, an administrator would mark all groups of nodes that are on the same LAN together, replacing the groups with a single entry for that LAN. Then LANs and nodes can be grouped together based on higher-level networks and backbones that connect them, until finally, one has a list of LANs and nodes which are joined purely by the public Internet, although we might further group is based on approximate network locality (eg, by ISP then by country) in order to help guide the WOLFRAM group messaging towards a better spanning tree.
But we can also tag interior nodes in the network topology tree with a classification. This reflects the highest classification we can trust that communications link with without needing to encrypt. If we do not specify any, then no trust is assumed, which means that ALL traffic across that network is encrypted (using an encryption algorithm appropriate to the classification of the traffic; but more on that later). The classification of a network should not normally be higher than that of any of its child networks, since that would imply we have a trusted network acting as the backbone to a less trusted network, but it's conceivable that such network topologies may arise. Also, the classification of a network should not normally be higher than the classification of any node attached to it, except when we can be very sure there is no way for that node to snoop any other traffic on the network (note that even a switched network may be prone to spoofing, if a compromised node emits packets with spoofed MAC addresses to make the switch divert traffic for the target node to it).
The concept of trusted networks is mainly for efficiency; if we have a group of nodes sharing a LAN in a room together, gaining access to that LAN is about as hard for an attacker as taking over one of the nodes anyway, so the benefit of encrypting the traffic across that LAN is probably not worth the cost. However, traffic outside of that LAN will involve going through higher levels of the topology tree, so may end up being encrypted if required.
Every message transmitted via MERCURY has a classification level. We will discuss the implications of this for communications between clusters later, but within the cluster, MERCURY runs alongside WOLFRAMM on the same UDP port, to allow in-cluster traffic to be packet-filtered differently from inter-cluster traffic.
By default, the classification level of a message is the classification of the entity sending the message. This is overridable, but only if the entity explicitly does so. The node receiving a message will reject it if the classification of the message is higher than the classification of the receiving entity, since the entity is not cleared to receive the message.
In fact, an in-cluster message with a classification higher than that of a node will not even be comprehensible by the node, since each classification level within the system has its own public key pair and shared secret. The private key and shared secret of each classification level is known only to nodes at that level or above, while the public keys are known to all nodes. Traffic to a node at the same or a lower classification than your own can be encrypted with the shared secret, while traffic to a node at a higher classification than yours must be done using the public key for that classification. The shared secrets are updated at an administratively-specified frequency, with new shared secrets being randomly generated and encrypted with the public key for that level, and sent to the appropriate nodes. The old secret is kept around until all messages encrypted with that key will have arrived.
I've not yet thought out the details, but it might be useful to encrypt data stored on disks. The keys may either have to be entered manually during booting, so that the interruption of power to a node during stealing it renders it useless, or perhaps clever mathematics used so that the node, upon booting, contacts any N nodes out of a set of M and extracts fragments of its key from them to construct its key. There's even some benefit in just storing the key on disk, since that makes it trivial to rapidly 'erase' a node by just erasing its stored keys then flushing all decrypted data from memory. If a different key is used for storing entities of each classification level, then one can rapidly downgrade the classification of a node (eg, as attackers are breaking down the door to the server room), by having it overwrite and forget the keys of all the classifications it is no longer eligible for.
Security between clusters
Security between clusters is more interesting. Any cluster may contact any other over the public Internet via MERCURY, packet filters permitting.
Every cluster has a public keypair, the private key of which is known to every node in the cluster, and the public key of which is part of the public ID of every entity within the cluster.
Access control (and other trust decisions) between clusters are based upon the entities originating the messages. However, we only actually authenticate the originating cluster, and when ensuring that our messages cannot be snooped, we merely ensure that they reach the correct cluster. This is because we cannot trust an entity any more than we can trust the cluster that hosts it. If we trust an entity with some information, we cannot really tell if the cluster hosting that entity is really sending the information to that entity or some other, so there is no point in authenticating at a finer granularity than the cluster.
Every entity maintains its own mapping of classifications to inter-cluster communications protection algorithms. The function of such an algorithm is, given the source cluster's public key pair and just the target cluster's public key, to convey sequences of bytes (raw MERCURY messages) over a lower-level IRIDIUM transport, while ensuring that only the target cluster can recover the message, that nobody men-in-the-middle can recover or alter the message without detection, and that the target cluster can check that the source cluster sent the message.
Of course, how well an algorithm does this job varies wildly. There's a null algorithm, that just sends the bytes as-is with no encryption or signing whatsoever. This is pretty naff from a security perspective, but it's fast, so might be used for 'public' classified communications.
On the other hand, a better algorithm might open an IRIDIUM virtual circuit and negotiate a session key, signing requests so that each end can check the identify of the other, then proceed to exchange messages using a modern block cipher with the session key, and frequent renegotation of said session key.
MERCURY, assuming that algorithms attempt to be fast in the general case by doing session key negotation at VC setup and shutdown, will attempt to cache already-negotiated algorithm channels between nodes and reuse them for more than one thing.
Now, if a node is attempting to send a message to a node in another cluster, it will consider the classification of the message, and will attempt to use the algorithm that cluster is configured to use for that classification. Classification hierarchies are unique to each cluster, but when the message (or initial request to set up shared session keys etc) is received by the destination, it applies its mapping from classifications to algorithms to decide which classification this algorithm represents (from its perspective), to tag the incoming message with.
If the destination cluster does not recognise the algorithm, it replies with a rejection, specifying the list of algorithms it does know. The sending node then finds an algorithm it will trust with the message (eg, an algorithm associated with the message's classification or higher) that is in the list, and uses that. If there is none, then it must sadly fail!
Access control decisions at the destination of a message are generally made based upon the originating entity ID. Since only the cluster ID within the originating entity ID is actually cryptographically checked (remembering that the cluster's public key is part of the entity ID), we are really only knowing which cluster the request came from, but we trust the cluster to correctly identify the entity sending the message - since if the cluster is 'bad', it could just make the entity in question 'bad' too.
However, it is possible for an entity to act on behalf of another entity for a while. For example, when somebody logs onto a desktop computer, they (in effect) tell that computer what their user entity is, then enter a password so that the computer can demonstrate to your user entity that it's really you. Your user entity then sends the computer your favourite user-interface software and settings, but as you browse CARBON from the computer, the entities you interact with must see the actions as coming from your entity, not from the computer you're on, or else they will be unable to make useful access-control decisions.
This could be done by proxying all your activities through your user entity, but that would not be very efficient. Instead, the user interface software keeps a connection to your user entity open for the duration of your session, and every minute, your user entity sends it a certificate (signed by the cluster as coming from that entity), stating that the entity of the user interface is allowed to act as it for the next two minutes.
This certificate is then sent along with every message issued on your behalf by the user interface (or early on in every virtual circuit, and then left out thereafter). The messages are still signed by the algorithm chosen for communications between the user interface node and the target node, but with the certificate wrapped within. The recipient, upon seeing the certificate, checks that the entity authorised to act on behalf within the certificate is the same entity that's originating the request, checks the certificate is not out of date, and then subsequently considers the request to have come from the entity that issued the certificate (with the identify of the intermediate entity still kept for auditing purposes).