Selections

Selections are subsets of the statistics in one or more multi-dimensional datasets.

There are two types of selection:

composite selections group other selections, called the parts. Parts may be composite in turn and originate from different dataasets.
base selections hold the data points of a single datasets and have no further structure.

Composite selections serve as "configuration points" for one or more of their parts. For example, they may group base selections over the same dataset, or group in turn all the same-dataset composite selections that are required for a particular questionnaire.

Note: questionnaires are defined over one selection. This is the top selection.

Base selections may also store coordinate metadata, i.e. metadata about the coordinates of their data points. As coordinates typically recur across data points, we store coordinate metadata outside data points, indexing it by the dimension of the coordinate first, and then by its code:

All selections:

are uniquely identified within questionnaires.
may have a descriptor that names their dataset and lists its dimensions in the order required to interpret the data points of the selections (dimension order).

The descriptors of composite selections apply to their base selections, provided these don't have one of their own.

Note: composite selections exist precisely to enable this propagation.

Data Points

A data point of a base selection:

includes a list of coordinates along all the dimensions of the dataset. A coordinate is a pair (name, code), where:
- the code is drawn from the codelist of the corresponding dimension.
- the name is in the language of the questionnaire that contains the data point.
may include a numeric observation. These are the actual statistics recorded at the data point.
may include a set of flags for the observation drawn from the flaglist underneath the flag profiles of the dataset.

Note: data pointd do not mention the dimension of their coordinates. To process them, we need the descriptors of the base selection.

Specifications

Selections are generated from selection specifications.

Specifications carry the identifiers and descriptors of selections, i.e. propagate them to selections at generation time.

Like selections, specifications can be composite or base. Unsurprisingly:

composite specifications aggregate other specifications and generate composite selections.
base specifications generate base selections and define in syntehtic fashion what data points they may have.

Base specifications contains the directives that drive most of the generation logic. They include:

a list of dimension specifications, each of which specifies a set of constraints on the coordinates of data points along a given dimension of the dataset. Constraints include:
- a list of allowed coordinates for this dimension. We speak of the range of the dimension and also say that a coordinate is "in range" for it.
- a subset of the range made of root coordinates. Any coordinate which is "below" a root in the dimension hierarchy is also in range for this dimension.
- a set of coordinate overrides, i.e. a replacement for the range of the next dimension to consider in correspondence of given coordinates in this range.
- a list of identifiers of other selections in the same questionnaire. These selection serve as dependencies of this selection and must be generated before it. Dependencies can influence the generation of this selection, primarily for anchoring (see below).
- a list of coordinate metadata specifications, i.e. labelled paths that identify items of interest into the JSON representation of any metadata available about the coordinates in range.
- a set of properties related to selection anchoring, i.e. the set of rules that govern the inclusion of data points inside selections based on currently available statistics:
  - an optional subset of the range made of anchor coordinates. Any data point that contains an anchor is included in the selection if it does not meet other inclusion criteria.
  - list of anchoring coordinates that serve as the dimension range for an auxiliary selection called the anchoring selection. The anchoring selection forces some data points in this selection even when they do not meet other inclusion criteria.
  - a set of anchoring dependency identifiers. These are other selections in the same questionnaire that force some data points in this selection even when they do not meet other inclusion criteria.
  We discuss selection anchoring in more detail elsewhere, where we describe how we load statistics inside freshly generated questionnaire objects.

PreviousParameters NextLayout

Last updated 6 years ago