Collecting aggregations

All computational results are collected in a so-called qp.Stack object which acts as a container for large amount of aggregations in form of qp.Links.

Populating a qp.Stack

A qp.Stack is able to cope with a large amount of aggregations, so it is impractical to add Links one by one with repeated Stack.add_link() calls. It is much easier to create a “construction plan” using a qp.Batch and apply the settings saved in DataSet._meta['sets']['batches'] to populate a qp.Stack instance. In the following, let’s assume dataset is containing the definitions of two qp.Batches, a qp.Stack can be created running:

stack = dataset.populate(batches='all')

For the Batch definitions from here, you will get the following construction plans:

>>> batch1 = dataset.get_batch('batch1')
>>> batch1.add_y_on_y('y_keys')
>>> print batch1.x_y_map
OrderedDict([('q1', ['@', 'gender', 'q1', 'locality', 'ethnicity']),
             ('q2', ['locality', 'ethnicity']),
             ('q6', ['@']),
             ('@', ['q6']),
             (u'q6_1', ['@', 'gender', 'q1']),
             (u'q6_2', ['@', 'gender', 'q1']),
             (u'q6_3', ['@', 'gender', 'q1'])])
>>> print batch1.x_filter_map
OrderedDict([('q1', {'(men only)+(q1)': (<function _intersection at 0x0000000019AE06D8>, [{'gender': 1}, {'age': [20, 21, 22, 23, 24, 25]}])}),
             ('q2', {'men only': {'gender': 1}}),
             ('q6', {'men only': {'gender': 1}}),
             ('q6_1', {'men only': {'gender': 1}}),
             ('q6_2', {'men only': {'gender': 1}}),
             ('q6_3', {'men only': {'gender': 1}})])
>>> batch2 = dataset.get_batch('batch2')
>>> print batch2.x_y_map
OrderedDict([('q2b', ['@', 'gender'])])
>>> print batch2.x_filter_map
OrderedDict([('q2b', 'no_filter')])

As both Batches refer to the same data file, the same data_key (in this case the name of dataset) is defining all Links.

After populating the Stack content can be viewed using .describe():

>>> stack.describe()
                data           filter       x          y  view  #
0   Example Data (A)         men only      q1         q1   NaN  1
1   Example Data (A)         men only      q1          @   NaN  1
2   Example Data (A)         men only      q1     gender   NaN  1
3   Example Data (A)         men only       @         q6   NaN  1
4   Example Data (A)         men only      q2  ethnicity   NaN  1
5   Example Data (A)         men only      q2   locality   NaN  1
6   Example Data (A)         men only    q6_1         q1   NaN  1
7   Example Data (A)         men only    q6_1          @   NaN  1
8   Example Data (A)         men only    q6_1     gender   NaN  1
9   Example Data (A)         men only    q6_2         q1   NaN  1
10  Example Data (A)         men only    q6_2          @   NaN  1
11  Example Data (A)         men only    q6_2     gender   NaN  1
12  Example Data (A)         men only    q6_3         q1   NaN  1
13  Example Data (A)         men only    q6_3          @   NaN  1
14  Example Data (A)         men only    q6_3     gender   NaN  1
15  Example Data (A)         men only  gender         q1   NaN  1
16  Example Data (A)         men only  gender          @   NaN  1
17  Example Data (A)         men only  gender     gender   NaN  1
18  Example Data (A)         men only      q6          @   NaN  1
19  Example Data (A)  (men only)+(q1)      q1         q1   NaN  1
20  Example Data (A)  (men only)+(q1)      q1          @   NaN  1
21  Example Data (A)  (men only)+(q1)      q1   locality   NaN  1
22  Example Data (A)  (men only)+(q1)      q1  ethnicity   NaN  1
23  Example Data (A)  (men only)+(q1)      q1     gender   NaN  1
24  Example Data (A)        no_filter     q2b          @   NaN  1
25  Example Data (A)        no_filter     q2b     gender   NaN  1

You can find all combinations defined in the x_y_map in the Stack structure, but also Links like Stack['Example Data (A)']['men only']['gender']['gender'] are included. These special cases arising from the y_on_y setting. Sometimes it is helpful to group a describe-dataframe and create a cross-tabulation of the four Link attributes to get a better overview, e.g. to see how many Links are included for each x-filter combination. :

>>> stack.describe('x', 'filter')
filter  (men only)+(q1)  men only  no_filter
x
@                   NaN       1.0        NaN
gender              NaN       3.0        NaN
q1                  5.0       3.0        NaN
q2                  NaN       2.0        NaN
q2b                 NaN       NaN        2.0
q6                  NaN       1.0        NaN
q6_1                NaN       3.0        NaN
q6_2                NaN       3.0        NaN
q6_3                NaN       3.0        NaN