first database draft

2020-09-06 00:35:08 +02:00 · 2020-09-06 00:35:08 +02:00 · d7bbb5396c
parent 099249fc41
commit d7bbb5396c
5 changed files with 75 additions and 11 deletions
--- a/docs/database.md
+++ b/docs/database.md
@ -0,0 +1,39 @@
+# Database structure
+## Store Table
+
+The store table contains a reference to all objects present in the database. Primarily, it should link file hashes to their physical location on the hard-drive (filesystem).
+
+Note: Strictly speaking, locating the file(path) by its hash need not require a separate table and could be simply represented by a reserved attribute in the *Data table* - however, seeing as file operations are one of the core features of UpEnd, there is nothing that prevents its concretization in the database's structure, and it simplifies implementation, as well as presumably increasing performance.
+
+### Columns
+- `id` - an integer autoincrementing primary key.
+- `hash` - the [multihash](https://github.com/multiformats/multihash) of a file or an object.
+- `path` - the filepath leading to the file of this `hash`.
+- `size` - size in bytes, to speed up comparison on vault updates.
+- `ts` - UNIX timestamp of when the file was first seen.
+- `valid` - whether the path still exists on the hard drive (for historical purposes mostly).
+
+## Data table
+
+This is where all of the structure of UpEnd lives. It is heavily inspired by [DataScript](https://github.com/tonsky/datascript) (which was in turn inspired by [Datomic](https://datomic.com)), and encouraged by greglook's [merkledag](https://github.com/greglook/merkledag-core) (ex-[vault](https://github.com/greglook/vault)).
+
+### Columns
+- `identity` - the multihash of the entry's content.
+- `target` - identity (hash OR UUID\*) of the object this entry refers to.
+- `key` - the name of the attribute. The "key" of the key/value pair.
+- `value` - JSON-encoded content of the attribute. The "value" of the key/value pair.
+
+#### The `target` column
+This is the only problematic aspect of this table.
+
+Since Datomic's "entity-ids" are simple integers and do not reflect the content of the entities in any way, they are free to be created at will. This presents a problem in `UpEnd`, being content-addressable at its core.
+
+Presuming that an attribute attaches to a file (which has an intrinsic hash), or another attribute (which has a computed hash, under which it is stored), the `target` can simply be that object's hash. However, a problem arises when an attribute (or rather, their collection) is desired such that it points to no existing object, i.e. in the case of objects which consist purely of metadata (K/V pairs), and do not exist on disk, e.g. a "contact".
+
+One approach would be to simply set the `target` field of those attributes (name, address, etc.) to `NULL` in that case. However, this would mean that there would be no way to differentiate between different such attributes pointing to different contacts, because all attributes constituting a "contact" would point to the same `NULL`. It would be possible to generate random data, store that object in the database (or on disk) and attach attributes to that, that is however wasteful, unnecessary, and inelegant.
+
+Seeing that the object need not exist in the first place, and all that is required is that all attributes belonging to a single "valueless object" share the same `target` (as in `Datomic`, where entity-ids are meaningless), a UUID is generated, and used in place of an actual hash. This UUID then conceptually presents an object in its full right, even if it does not exist neither in the *Store table*, nor in the *Data table*.
+
+
+#### The `key` column
+In this initial version, for simplicity of implementation, the `key` of an entry consists of a simple string key. In order to prevent collisions and to enable more flexibility, it may be converted in the future to a hash value also, allowing annotations and metadata to attach to attributes as well.
--- a/migrations/upend/2020-08-04-134952_file_hashes/down.sql
+++ b/migrations/upend/2020-08-04-134952_file_hashes/down.sql
--- a/migrations/upend/00_initial_structure/up.sql
+++ b/migrations/upend/00_initial_structure/up.sql
@ -0,0 +1,21 @@
+CREATE TABLE files (
+    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
+    hash VARCHAR NOT NULL,
+    path VARCHAR NOT NULL,
+    size BIGINT NOT NULL,
+    ts DATETIME NOT NULL,
+    valid BOOLEAN NOT NULL DEFAULT TRUE
+);
+
+CREATE INDEX files_hash ON files(hash);
+CREATE INDEX files_path ON files(path);
+CREATE INDEX files_valid ON files(valid);
+
+CREATE TABLE data (
+    identity BLOB PRIMARY KEY NOT NULL,
+    target BLOB NOT NULL,
+    key VARCHAR NOT NULL,
+    value VARCHAR NOT NULL
+);
+
+CREATE INDEX data_target ON data(target);
--- a/migrations/upend/2020-08-04-134952_file_hashes/up.sql
+++ b/migrations/upend/2020-08-04-134952_file_hashes/up.sql
@ -1,11 +0,0 @@
-CREATE TABLE files (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    hash VARCHAR NOT NULL,
-    path VARCHAR NOT NULL,
-    size BIGINT NOT NULL,
-    valid BOOLEAN NOT NULL DEFAULT TRUE
-);
-
-CREATE INDEX files_hash ON files(hash);
-CREATE INDEX files_path ON files(path);
-CREATE INDEX files_valid ON files(valid);
--- a/src/schema.rs
+++ b/src/schema.rs
@ -1,9 +1,24 @@
+table! {
+    data (identity) {
+        identity -> Binary,
+        target -> Binary,
+        key -> Text,
+        value -> Text,
+    }
+}
+
 table! {
    files (id) {
        id -> Integer,
        hash -> Text,
        path -> Text,
        size -> BigInt,
+        ts -> Timestamp,
        valid -> Bool,
    }
 }
+
+allow_tables_to_appear_in_same_query!(
+    data,
+    files,
+);