Collecting aggregations¶

All computational results are collected in a so-called qp.Stack object which acts as a container for large amount of aggregations in form of qp.Links.

What is a `qp.Link?`¶

A qp.Link is defined by four attributes that make it unique and set how it is stored in a qp.Stack. These four attributes are data_key, filter, x (downbreak) and y (crossbreak), which are positioned in a qp.Stack similar to a tree diagram:

Each Stack can have various data_keys.

Each data_key can have various filters.

Each filter can have various xs.

Each x can have various ys.

Consequently qp.Stack[dk][filter][x][y] is one qp.Link that can be added using add_link(self, data_keys=None, filters=['no_filter'], x=None, y=None, ...)

qp.Links are are storing different qp.Views (frequencies, statistics, etc. - all kinds of computations) that are applied on the same four data attributes.

Populating a `qp.Stack`¶

A qp.Stack is able to cope with a large amount of aggregations, so it is impractical to add Links one by one with repeated Stack.add_link() calls. It is much easier to create a “construction plan” using a qp.Batch and apply the settings saved in DataSet._meta['sets']['batches'] to populate a qp.Stack instance. In the following, let’s assume dataset is containing the definitions of two qp.Batches, a qp.Stack can be created running:

stack = dataset.populate(batches='all')

For the Batch definitions from here, you will get the following construction plans:

>>> batch1 = dataset.get_batch('batch1')
>>> batch1.add_y_on_y('y_keys')

>>> print batch1.x_y_map
OrderedDict([('q1', ['@', 'gender', 'q1', 'locality', 'ethnicity']),
             ('q2', ['locality', 'ethnicity']),
             ('q6', ['@']),
             ('@', ['q6']),
             (u'q6_1', ['@', 'gender', 'q1']),
             (u'q6_2', ['@', 'gender', 'q1']),
             (u'q6_3', ['@', 'gender', 'q1'])])

>>> print batch1.x_filter_map
OrderedDict([('q1', {'(men only)+(q1)': (<function _intersection at 0x0000000019AE06D8>, [{'gender': 1}, {'age': [20, 21, 22, 23, 24, 25]}])}),
             ('q2', {'men only': {'gender': 1}}),
             ('q6', {'men only': {'gender': 1}}),
             ('q6_1', {'men only': {'gender': 1}}),
             ('q6_2', {'men only': {'gender': 1}}),
             ('q6_3', {'men only': {'gender': 1}})])

>>> batch2 = dataset.get_batch('batch2')

>>> print batch2.x_y_map
OrderedDict([('q2b', ['@', 'gender'])])

>>> print batch2.x_filter_map
OrderedDict([('q2b', 'no_filter')])

As both Batches refer to the same data file, the same data_key (in this case the name of dataset) is defining all Links.

After populating the Stack content can be viewed using .describe():

>>> stack.describe()
                data           filter       x          y  view  #
 Example Data (A)         men only      q1         q1   NaN  1
 Example Data (A)         men only      q1          @   NaN  1
 Example Data (A)         men only      q1     gender   NaN  1
 Example Data (A)         men only       @         q6   NaN  1
 Example Data (A)         men only      q2  ethnicity   NaN  1
 Example Data (A)         men only      q2   locality   NaN  1
 Example Data (A)         men only    q6_1         q1   NaN  1
 Example Data (A)         men only    q6_1          @   NaN  1
 Example Data (A)         men only    q6_1     gender   NaN  1
 Example Data (A)         men only    q6_2         q1   NaN  1
Example Data (A)         men only    q6_2          @   NaN  1
Example Data (A)         men only    q6_2     gender   NaN  1
Example Data (A)         men only    q6_3         q1   NaN  1
Example Data (A)         men only    q6_3          @   NaN  1
Example Data (A)         men only    q6_3     gender   NaN  1
Example Data (A)         men only  gender         q1   NaN  1
Example Data (A)         men only  gender          @   NaN  1
Example Data (A)         men only  gender     gender   NaN  1
Example Data (A)         men only      q6          @   NaN  1
Example Data (A)  (men only)+(q1)      q1         q1   NaN  1
Example Data (A)  (men only)+(q1)      q1          @   NaN  1
Example Data (A)  (men only)+(q1)      q1   locality   NaN  1
Example Data (A)  (men only)+(q1)      q1  ethnicity   NaN  1
Example Data (A)  (men only)+(q1)      q1     gender   NaN  1
Example Data (A)        no_filter     q2b          @   NaN  1
Example Data (A)        no_filter     q2b     gender   NaN  1

You can find all combinations defined in the x_y_map in the Stack structure, but also Links like Stack['Example Data (A)']['men only']['gender']['gender'] are included. These special cases arising from the y_on_y setting. Sometimes it is helpful to group a describe-dataframe and create a cross-tabulation of the four Link attributes to get a better overview, e.g. to see how many Links are included for each x-filter combination. :

>>> stack.describe('x', 'filter')
filter  (men only)+(q1)  men only  no_filter
x
@                   NaN       1.0        NaN
gender              NaN       3.0        NaN
q1                  5.0       3.0        NaN
q2                  NaN       2.0        NaN
q2b                 NaN       NaN        2.0
q6                  NaN       1.0        NaN
q6_1                NaN       3.0        NaN
q6_2                NaN       3.0        NaN
q6_3                NaN       3.0        NaN

Collecting aggregations¶

What is a qp.Link?¶

Populating a qp.Stack¶

What is a `qp.Link?`¶

Populating a `qp.Stack`¶