2012. október 25., csütörtök

Working with data

The keyword here is reusability. I want an environment with totally independent parts, the same components for expression evaluation, interpreting an algorithm or accessing object member values from code; I want both string based access and compile time / run time support for type and nae checking. See how this goes.

Identifier

All starts with this. The Identifier has a short, limited ASCII "name", which must be unique in its context; and a reference to this context, which can either be a type declaration or an actual context (like a code block for interpreter environment).
The Identifier instance can be requested from its context by referring to its name, so it is unique, and adds the context information to the string name, therefore name collisions can be avoided.

Variant

The Variant is the core data element; a wrapper that holds the actual data value. It supports

  • generic values like boolean, integer, floating point values and fixed length ASCII strings;
  • single object reference to other objects;
  • collection of generic values / references. The collection has flags of sorted: if the elements have a fixed enumeration order, and single: one element can appear only once in the collection.
The Variant also has a VariantDeclaration reference, where all meta information of that variant instance is stored. This means that the Variant itself is repeated for all actual values "anywhere"; they can travel through expressions, contexts, etc., but they always hold a reference to a data structure that can define them and the actual value can be understood by them.

Another idea: it holds a reference to an DataObject in which it lives AND a definition index - because in many cases I must go "up" from the Variant to the owner DataObject. And the final blow: the Variant has direct access and modification interface (to be used in expressions and interpreters), but is must invoke the change listeners of the owner DataObject, if exists. This requires DataObject reference in the Variant.

VariantDeclaration

This contains the variant identifier and all other information (type, access, life cycle, etc.). The declaration may come from a type or direct request in an interpreter. The important part is that the declaration is generally repeated for several Variant instances, so it is lifted from them.

The life cycle information (like "already set" or "final") may belong to an object instance - this enforces that the Variant should refer to the DataObject with a declaration index and not the declaration directly.

DataObject

The DataObject contains an array of VariantDeclarations and an array of Variants; it acts as an Aspect instance inside an Entity, but also the context for a running system, or the code block stack in an interpreted algorithm.

To mix flexibility with memory optimization, the most used DataObjects (Aspects) start with their declaration array initialized with the array from their type, which is immutable shared array instance, and sufficient for most cases. However, this instance may be referred from other aspects, resulting a reverse reference to the referring aspects of alien VariantDeclaration. When this thing happens, the array is copied to a mutable declaration array, and the new VariantDeclaration is added to this DataObject together with the new Variant. 

The same applies to the generic DataObjects acting as context: they have no initial declaration array, but extended on each variant declaration request. The only difference: the declaration objects are themselves can be local strings, in this case they must be unique in the actual context. However, types may contain "shared" member declaration, which means they are not contained by the Aspect instance but the identified Context (this feature implements the "static member" feature of the programming language, and extends it by enabling different Context levels like runtime, session, user, ...)

Field

Field is used when we access the Variant from the DataObject; to do so we need the object itself, and an Identifier - which is not just a name, but a context as well. It is also important that the Field itself is more locked to the Identifier than to the object: it is fine to access the same Variant in multiple objects through the same Field instance, but it is not good to switch a Field to access another member Variant of the same object. Consequentially, the Identifier is final, while the object is mutable - although you can get a Field directly from an object by providing the requested identifier, but the field can be reused for other objects that have the same member.

The Field inside is responsible for removing the identifier based string access. It has an object reference, and a VariantDeclaration index inside. When (and only when) you call setObject with a new reference, the identifier is searched in the declaration array of that object, and the index is stored. Whenever you access the content, it is just an indexed access of the variant array of the object behind the field.

Fields are mostly used in declarative "toolkit" components, where the actual context "knows" the required type, the field is identified by the declaration name within the type, and you can manipulate the content through the Field instances. Most of such environments work with multiple object instances of the same type (like a GUI panel, a string template, a table, etc.) for which the Field's final lock to a specific declaration is also a good feature.

FieldSet

A helper component for dynamic access environments: you provide a type and an array of strings, they are internally transferred to an array of identifiers and Field instances, and so you have an indexed access to the values, where the index is in sync with the index of the field name in the string array.

TypeWrapper

The heavy-weight wrapper for an aspect. This is a generated code from the type declaration in the target programming language. It contains a FieldSet by the type, and a reference to an actual Aspect instance, typed get/set functions with type casts by the variant names. The TypeWrapper actually looks like a "normal object" for the programmer. For each Unit declaration, the wrappers can be generated, and used by anyone who wants to use the objects from their native code. The important feature here is that the actual data binding between the wrapper and the data object instance is done runtime, by the declaration and identifiers, not any fompile time fixed tables. This really allows extending the types without breaking the caller code.

It also contains static accessor functions, which are translated to getting and casting the referred items from the declared context (like "Logger.getLog" returns the log object in the closest context: transaction, session, system). The wrapper also incorporates message sending, but this is another story. When working and writing code on this level, it should feel almost like coding without Dust in the background.