upend/docs/database.md

3.6 KiB

Database structure

Store Table

The store table contains a reference to all objects present in the database. Primarily, it should link file hashes to their physical location on the hard-drive (filesystem).

Note: Strictly speaking, locating the file(path) by its hash need not require a separate table and could be simply represented by a reserved attribute in the Data table - however, seeing as file operations are one of the core features of UpEnd, there is nothing that prevents its concretization in the database's structure, and it simplifies implementation, as well as presumably increasing performance.

Columns

  • id - an integer autoincrementing primary key.
  • hash - the multihash of a file or an object.
  • path - the filepath leading to the file of this hash.
  • size - size in bytes, to speed up comparison on vault updates.
  • ts - UNIX timestamp of when the file was first seen.
  • valid - whether the path still exists on the hard drive (for historical purposes mostly).

Data table

This is where all of the structure of UpEnd lives. It is heavily inspired by DataScript (which was in turn inspired by Datomic), and encouraged by greglook's merkledag (ex-vault).

Columns

  • identity - the multihash of the entry's content.
  • target - identity (hash OR UUID*) of the object this entry refers to.
  • key - the name of the attribute. The "key" of the key/value pair.
  • value - JSON-encoded content of the attribute. The "value" of the key/value pair.

The target column

This is the only problematic aspect of this table.

Since Datomic's "entity-ids" are simple integers and do not reflect the content of the entities in any way, they are free to be created at will. This presents a problem in UpEnd, being content-addressable at its core.

Presuming that an attribute attaches to a file (which has an intrinsic hash), or another attribute (which has a computed hash, under which it is stored), the target can simply be that object's hash. However, a problem arises when an attribute (or rather, their collection) is desired such that it points to no existing object, i.e. in the case of objects which consist purely of metadata (K/V pairs), and do not exist on disk, e.g. a "contact".

One approach would be to simply set the target field of those attributes (name, address, etc.) to NULL in that case. However, this would mean that there would be no way to differentiate between different such attributes pointing to different contacts, because all attributes constituting a "contact" would point to the same NULL. It would be possible to generate random data, store that object in the database (or on disk) and attach attributes to that, that is however wasteful, unnecessary, and inelegant.

Seeing that the object need not exist in the first place, and all that is required is that all attributes belonging to a single "valueless object" share the same target (as in Datomic, where entity-ids are meaningless), a UUID is generated, and used in place of an actual hash. This UUID then conceptually presents an object in its full right, even if it does not exist neither in the Store table, nor in the Data table.

The key column

In this initial version, for simplicity of implementation, the key of an entry consists of a simple string key. In order to prevent collisions and to enable more flexibility, it may be converted in the future to a hash value also, allowing annotations and metadata to attach to attributes as well.