Generation

Specifications generate questionnaires from a number of inputs, including:

questionnaire parameters, as defined by the specifications.
the statistics to load in the questionnaires, if any.
the reference data required to resolve the coordinates and flags of the statistics.

Generation is called upon in the following scenarios:

during specification design, to preview results.
to reload a (yet unpublished) questionnaire with the latest available statistics, effectively replacing it with a freshly generated one.
in the context of campaign population, where generation is applied in bulk.

Overview

Design-wise, we take two different views of the generation process, one broader than the other.

Strictly speaking, generation:

does not subsume rendering.
It simply produces an object model of the questionnaire which is yet to be rendered in some format.
does not involve network or database access.
It assumes that all its inputs - including statistics, reference data, and their metadata - are all available in the execution context.
does not emit lifecycle notifications.
Notifications are emitted if and when the generated questionnaires are persisted in the system.

Users, on the other hand, take a broader view of the process which aligns with the usage scenarios outlined above: previewing, reloading, populating. In this view, fetching, loading, saving, and rendering become integral parts, indeed options, of the generation process.

Ou design reflects both views:

we first implement generation in the domain model, as an operation on the model of specifications. This reflects the narrower, algorithmic view of the process.
we then design UI, APIs, and backend services around the domain model, so as to support the broader view.

Clients provide a number of generation directives, including;

the execution mode, i.e. whether the client wishes to wait for outcomes (sync), or observe/collect them later through the Task API (async).
the data mode, i.e. whether the generated questionnaire should be loaded with real data, synthetic data, or no data at all.
the persistence mode, i.e. whether the generated questionnaire should be saved within the system and in what group, or simply returned to the client.
the render mode, i.e. whether the generated questionnaire should be rendered and if so in what format.

Collectively, these directives support a rich array of interactions with the system, e.g.:

render a questionnaire to test specification design -- perhaps in a debug format and real data (to test selection specifications), or in a full format but with fake or no data (to test layout specifications) -- and collect the output immediately without leaving permanent traces in the system.
generate one or hundreds of questionnaires and save them un-rendered in the system in order to fully manage their lifecycle, perhaps in the context of an official campaign.

API and Service

The Generation API accepts requests and schedules them on a thread pool dedicated to long-running tasks. The task itself is carried out by the Generation Service.

If the execution mode is synchronous:

the service keeps requests open onto a channel, i.e. a stream of progress report messages which terminates in errors or results.
results are JSON objects of optional identifiers. In save mode, they include the identifier of the generated questionnaire; clients can resolve it with the Specification API. In render mode , they include the identifier of a bytestream with the rendered outcomes; clients can resolve it with the File API. As save and render modes are not exclusive, results can include both identifiers.

If the execution mode is asynchronous:

the service returns immediately with the identifier of a generation task.
clients may resolve the identifier with the Task API, to poll for request processing status and eventually retrieve or observe outcomes. Alternatively, clients that have previously subscribed with the Task API may wait for state-change notifications from the server.
in both cases clients obtain Task reports that document failures or results. Results take the same form as in the synchronous case.

Domain Logic

At the heart of the generation process is the algorithm that yields a questionnaire object.

As mentioned above, we implement the logic directly in the domain model, as a set of interrelated operations on specification objects:

we start working in the questionnaire specs and delegate the work to its components: selection specs, layout specs, and parameter set specs.
in each component, we produce a corresponding instance object: selections, layout, and parameter set.
back in questionnaire specs, we compose all the objects into the final questionnaire object.

The same "delegate-and-collect" approach is repeated inside components like selections and layout, which also have a deep structure.

In all the operations:

we validate the specs, for extra assurance that what they generate will be sound from a domain perspecitve.
With this just-in-time we validate and complement the regular validation performed during the specification lifecycle.
we rely heavily on a shared generation context, a container of:
- external inputs, such parameters and statistics to load.
- intermediate results on the state of the process (indices, accumulators, etc).
- information providers, such as codelist and parameter resolvers.

The generation logic for parameter sets is very simple, essentially just a validation exercise of the parameters in the generation context.

Slightly more complex is the logic required to generate layouts, and even more complex is the logic required to generate selections. We discuss both below.

Layout Generation

When we generate a layout:

we first add its theme to the parameters of the generation context.
then we generate its pages, their properties, and their child components.
this brings us to other layout components and properties, so we repeat the process recursively along component and property hierarchies until we've traversed the entire layout.

As we move through the layout we:

copy types from component specs to components.
copy types and names from property specs to properties.
resolve any parameter reference we may encounter in the values of property specs.

All this recursion aside, layout generation is fairly simple: we basically make a copy of it with resolved parameters.

In implementation terms, however, the changes are more significant: we move from a static model of diverse specification objects (different components or properties have different models) to a dynamic model of generic instance objects (all components and properties have the same model).

Selection Generation

We generate selections in a recursive fashion, moving down the selection composition hierarchy with the generation context at hand.

We don't proceed depth-first, however. We move instead dep-first, i.e. generate dependencies before we generate dependents.

Note: for composite selections, there's no much difference really: the dependencies are the parts. For base selections, we must look at explicit dependencies and follow those first.

To go dep-first, we need two things:

an index of all selection specs in the hierarchy, so that we can lookup the specs of dependencies from their dependents.
We build the index as a preliminary step, with a one-off depth-first traversal of the hierarchy. In the index entries, we include the selection descriptors that spec objects include or inherit from their ancestors. This means that we always know how to interpret and process the dependencies that we lookup, even if we haven't arrived there from their parents.
a store of all the selections we've generated so far, so that we can lookup the dependencies we've generated so far using the index above.
We also use the store to detect sharing and avoid generating the same selection twice.

We keep spec index and selection store are in the generation context.

This is all book-keeping, the real action occurs in the specs of base selections. Here we proceed once again in recursive fashion, this time by induction on the N dimensions of the dataset. This means that:

we first build the tail selection, i.e. the selection with N-1 of the original dimensions.
then we extend it with the range of the N-th dimension, effectively adding each of its coordinates to the data points of the tail selection.
as a base case, we generate the empty selection, i.e. the selection with 0 dimensions.

With the recursion above, we compute the cartesian product all the dimension ranges in the base selection. As we consider each coordinate of the "current" dimenson in step 2, we also:

resolve any parameter reference that may be specified in its code (with the parameter resolver).
if the coordinate is a root, expand the process to include all the descendants of its code (in codelist hierarchy).
consult the coordinate metadata specifications of the dimension and use it to extract and collect relevant code metadata from the underlying codelist (with the codelist resolve). At the end of the process, we record the collected metadata on the generated selection, so that it can be used at rendering time.

This leaves with a fully-formed and utterly empty collection. We then look in the generation context to see if there are statistics that we should load into it. We discuss loading and the associated issue of selection anchoring in a dedicated section.

PreviousProcesses NextLoading

Last updated 6 years ago