View aggregation¶

All following examples are working with a qp.Stack that was populated from a qp.DataSet including the following qp.Batch definitions:

>>> batch1 = dataset.get_batch('batch1')
>>> batch1.add_y_on_y('y_keys')

>>> print batch1.x_y_map
OrderedDict([('q1', ['@', 'gender', 'q1', 'locality', 'ethnicity']),
             ('q2', ['locality', 'ethnicity']),
             ('q6', ['@']),
             ('@', ['q6']),
             (u'q6_1', ['@', 'gender', 'q1']),
             (u'q6_2', ['@', 'gender', 'q1']),
             (u'q6_3', ['@', 'gender', 'q1'])])

>>> print batch1.x_filter_map
OrderedDict([('q1', {'(men only)+(q1)': (<function _intersection at 0x0000000019AE06D8>, [{'gender': 1}, {'age': [20, 21, 22, 23, 24, 25]}])}),
             ('q2', {'men only': {'gender': 1}}),
             ('q6', {'men only': {'gender': 1}}),
             ('q6_1', {'men only': {'gender': 1}}),
             ('q6_2', {'men only': {'gender': 1}}),
             ('q6_3', {'men only': {'gender': 1}})])

>>> print batch1.weights
['weight_a']

>>> batch2 = dataset.get_batch('batch2')

>>> print batch2.x_y_map
OrderedDict([('q2b', ['@', 'gender'])])

>>> print batch2.x_filter_map
OrderedDict([('q2b', 'no_filter')])

>>> print batch2.weights
['weight']

Basic views¶

It is possible to add various qp.Views to a Link. This can be performed by running Stack.add_link() providing View objects via the view parameter. Alternatively, the qp.Batch definitions that are stored in the meta data help to add basic Views (counts, percentages, bases and sums). By simply running Stack.aggregate() we can easily add a large amount of aggregations in one step.

Note

Stack.aggregate() can only be used with pre-populated Stacks! (see DataSet.populate()).

For instance, we can add column percentages and (unweighted and weighted) base sizes to all Links of batch2 like this:

>>> stack.aggregate(views=['c%', 'cbase'], unweighted_base=True, batches='batch2', verbose=False)
>>> stack.describe()
                data           filter       x          y                  view  #
 Example Data (A)         men only      q1         q1                   NaN  1
 Example Data (A)         men only      q1          @                   NaN  1
 Example Data (A)         men only      q1     gender                   NaN  1
 Example Data (A)         men only       @         q6                   NaN  1
 Example Data (A)         men only      q2  ethnicity                   NaN  1
 Example Data (A)         men only      q2   locality                   NaN  1
 Example Data (A)         men only    q6_1         q1                   NaN  1
 Example Data (A)         men only    q6_1          @                   NaN  1
 Example Data (A)         men only    q6_1     gender                   NaN  1
 Example Data (A)         men only    q6_2         q1                   NaN  1
Example Data (A)         men only    q6_2          @                   NaN  1
Example Data (A)         men only    q6_2     gender                   NaN  1
Example Data (A)         men only    q6_3         q1                   NaN  1
Example Data (A)         men only    q6_3          @                   NaN  1
Example Data (A)         men only    q6_3     gender                   NaN  1
Example Data (A)         men only  gender         q1                   NaN  1
Example Data (A)         men only  gender          @                   NaN  1
Example Data (A)         men only  gender     gender                   NaN  1
Example Data (A)         men only      q6          @                   NaN  1
Example Data (A)  (men only)+(q1)      q1         q1                   NaN  1
Example Data (A)  (men only)+(q1)      q1          @                   NaN  1
Example Data (A)  (men only)+(q1)      q1   locality                   NaN  1
Example Data (A)  (men only)+(q1)      q1  ethnicity                   NaN  1
Example Data (A)  (men only)+(q1)      q1     gender                   NaN  1
Example Data (A)        no_filter     q2b          @     x|f|:|y|weight|c%  1
Example Data (A)        no_filter     q2b          @  x|f|x:||weight|cbase  1
Example Data (A)        no_filter     q2b          @        x|f|x:|||cbase  1
Example Data (A)        no_filter     q2b     gender     x|f|:|y|weight|c%  1
Example Data (A)        no_filter     q2b     gender  x|f|x:||weight|cbase  1
Example Data (A)        no_filter     q2b     gender        x|f|x:|||cbase  1

Obviously Views are only added to Links defined by batch2 and automatically weighted according to the weight definition of batch2, which is evident from the view keys (x|f|:|y|weight|c%). Combining the information of the four Link attributes with a view key, leads to a pd.DataFrame and its belonging meta information:

>>> link = stack['Example Data (A)']['no_filter']['q2b']['gender']
>>> view_key = 'x|f|:|y|weight|c%'

>>> link[view_key]
Question               q2b
Values                   @
Question Values
q2b      1       11.992144
         2       80.802580
         3        7.205276

>>> link[view_key].meta()
{
    "agg": {
        "weights": "weight",
        "name": "c%",
        "grp_text_map": null,
        "text": "",
        "fullname": "x|f|:|y|weight|c%",
        "is_weighted": true,
        "method": "frequency",
        "is_block": false
    },
    "x": {
        "is_array": false,
        "name": "q2b",
        "is_multi": false,
        "is_nested": false
    },
    "shape": [
        3,
        1
    ],
    "y": {
        "is_array": false,
        "name": "@",
        "is_multi": false,
        "is_nested": false
    }
}

Now we are adding Views to all batch1-defined Links as well:

>>> stack.aggregate(views=['c%', 'counts', 'cbase'], unweighted_base=True, batches='batch1', verbose=False)
>>> stack.describe(['x', 'view'], 'y').loc[['@', 'q6'], ['@', 'q6']]
y                            @   q6
x  view
@  x|f|:|y|weight_a|c%     NaN  1.0
   x|f|:||weight_a|counts  NaN  1.0
q6 x|f|:|y|weight_a|c%     1.0  NaN
   x|f|:||weight_a|counts  1.0  NaN

Even if unweighted bases are requested, they get skipped for array summaries and transposed arrays.

Since y_on_y is requested, for a variable used as cross- and downbreak, with an extended filter (in this example q1), two Links with Views are created:

>>> stack.describe(['y', 'filter', 'view'], 'x').loc['q1', 'q1']
filter           view
(men only)+(q1)  x|f|:|y|weight_a|c%       1.0
                 x|f|:||weight_a|counts    1.0
                 x|f|x:||weight_a|cbase    1.0
                 x|f|x:|||cbase            1.0
men only         x|f|:|y|weight_a|c%       1.0
                 x|f|:||weight_a|counts    1.0
                 x|f|x:||weight_a|cbase    1.0
                 x|f|x:|||cbase            1.0

The first one is the aggregation defined by the Batch construction plan, the second one shows the y_on_y aggregation using only the main Batch.filter.

Non-categorical variables¶

>>> batch3 = dataset.add_batch('batch3')
>>> batch3.add_x('age')
>>> stack = dataset.populate('batch3')
>>> stack.describe()
               data     filter    x  y  view  #
0  Example Data (A)  no_filter  age  @   NaN  1

Non-categorical variables (ìnt or float) are handled in a special way. There are two options:

Treat them like categorical variables: Append them to the parameter categorize, then counts, percentage and sum aggregations can be added alongside the cbase View.

>>> stack.aggregate(views=['c%', 'counts', 'cbase', 'counts_sum', 'c%_sum'],
                    unweighted_base=True,
                    categorize=['age'],
                    batches='batch3',
                    verbose=False)

>>> stack.describe()
               data     filter    x  y                     view  #
Example Data (A)  no_filter  age  @           x|f|:|||counts  1
Example Data (A)  no_filter  age  @     x|f.c:f|x:|y||c%_sum  1
Example Data (A)  no_filter  age  @              x|f|:|y||c%  1
Example Data (A)  no_filter  age  @           x|f|x:|||cbase  1
Example Data (A)  no_filter  age  @  x|f.c:f|x:|||counts_sum  1

Do not categorize the variable: Only cbase is created and additional descriptive statistics Views must be added. The method will raise a warning:

>>> stack.aggregate(views=['c%', 'counts', 'cbase', 'counts_sum', 'c%_sum'],
                    unweighted_base=True,
                    batches='batch3',
                    verbose=True)
Warning: Found 1 non-categorized numeric variable(s): ['age'].
Descriptive statistics must be added!

>>> stack.describe()
               data     filter    x  y            view  #
0  Example Data (A)  no_filter  age  @  x|f|x:|||cbase  1

Descriptive statistics¶

>>> b_name = 'batch4'
>>> batch4 = dataset.add_batch(b_name)
>>> batch4.add_x(['q2b', 'q6', 'age'])
>>> stack = dataset.populate(b_name)
>>> stack.aggregate(views=['counts', 'cbase'], batches=b_name, verbose=False)

>>> stack.describe()
                data     filter     x  y            view  #
 Example Data (A)  no_filter   q2b  @  x|f|:|||counts  1
 Example Data (A)  no_filter   q2b  @  x|f|x:|||cbase  1
 Example Data (A)  no_filter  q6_1  @  x|f|:|||counts  1
 Example Data (A)  no_filter  q6_1  @  x|f|x:|||cbase  1
 Example Data (A)  no_filter  q6_2  @  x|f|:|||counts  1
 Example Data (A)  no_filter  q6_2  @  x|f|x:|||cbase  1
 Example Data (A)  no_filter  q6_3  @  x|f|:|||counts  1
 Example Data (A)  no_filter  q6_3  @  x|f|x:|||cbase  1
 Example Data (A)  no_filter   age  @  x|f|x:|||cbase  1
 Example Data (A)  no_filter    q6  @  x|f|:|||counts  1
Example Data (A)  no_filter    q6  @  x|f|x:|||cbase  1

Adding descriptive statistics Views like mean, stddev, min, max, median, etc. can be added with the method stack.add_stats(). With the parameters other_source, rescale and exclude you can specify the calculation. Again each combination of the parameters refers to a unique view key. Note that in on_vars included arrays get unrolled, that means also all belonging array items get equipped with the added View:

>>> stack.add_stats(on_vars=['q2b', 'age'], stats='mean', _batches=b_name, verbose=False)
>>> stack.add_stats(on_vars=['q6'], stats='stddev', _batches=b_name, verbose=False)
>>> stack.add_stats(on_vars=['q2b'], stats='mean', rescale={1:100, 2:50, 3:0},
...                 custom_text='rescale mean', _batches=b_name, verbose=False)

>>> stack.describe('view', 'x')
x                               age  q2b   q6  q6_1  q6_2  q6_3
view
x|d.mean|x:|||stat              1.0  1.0  NaN   NaN   NaN   NaN
x|d.mean|x[{100,50,0}]:|||stat  NaN  1.0  NaN   NaN   NaN   NaN
x|d.stddev|x:|||stat            NaN  NaN  1.0   1.0   1.0   1.0
x|f|:|||counts                  NaN  1.0  1.0   1.0   1.0   1.0
x|f|x:|||cbase                  1.0  1.0  1.0   1.0   1.0   1.0

Nets¶

>>> b_name = 'batch5'
>>> batch5 = dataset.add_batch(b_name)
>>> batch5.add_x(['q2b', 'q6'])
>>> stack = dataset.populate(b_name)
>>> stack.aggregate(views=['counts', 'c%', 'cbase'], batches=b_name, verbose=False)

>>> stack.describe('view', 'x')
x               q2b  q6  q6_1  q6_2  q6_3
view
x|f|:|y||c%       1   1     1     1     1
x|f|:|||counts    1   1     1     1     1
x|f|x:|||cbase    1   1     1     1     1

Net-like Views can be added with the method Stack.add_nets() by defining net_maps for selected variables. There is a distinction between two different types of net Views:

Expanded nets: The existing counts or percentage Views are replaced with the new net Views, which will the net-defining codes after or before the computed net groups (i.e. “overcode” nets).

>>> stack.add_nets('q2b', [{'Top2': [1, 2]}], expand='after', _batches=b_name, verbose=False)

>>> stack.describe('view', 'x')
x                       q2b   q6  q6_1  q6_2  q6_3
view
x|f|:|y||c%             NaN  1.0   1.0   1.0   1.0
x|f|:|||counts          NaN  1.0   1.0   1.0   1.0
x|f|x:|||cbase          1.0  1.0   1.0   1.0   1.0
x|f|x[{1,2}+]*:|y||net  1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}+]*:|||net   1.0  NaN   NaN   NaN   NaN

Not expanded nets: The new net Views are added to the stack, which contain only the computed net groups.

>>> stack.add_nets('q2b', [{'Top2': [1, 2]}], _batches=b_name, verbose=False)

>>> stack.describe('view', 'x')
x                       q2b   q6  q6_1  q6_2  q6_3
view
x|f|:|y||c%             NaN  1.0   1.0   1.0   1.0
x|f|:|||counts          NaN  1.0   1.0   1.0   1.0
x|f|x:|||cbase          1.0  1.0   1.0   1.0   1.0
x|f|x[{1,2}+]*:|y||net  1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}+]*:|||net   1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}]:|y||net    1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}]:|||net     1.0  NaN   NaN   NaN   NaN

The difference between the two net types are also visible in the view keys: x|f|x[{1,2}+]*:|||net versus x|f|x[{1,2}]:|||net.

Net definitions¶

To create more complex net definitions the method quantipy.net() can be used, which generates a well-formatted instruction dict and appends it to the net_map. It’s a helper especially concerning including various texts with different valid text_keys. The next example shows how to prepare a net for ‘q6’ (promoters, detractors):

>>> q6_net = qp.net([], [1, 2, 3, 4, 5, 6], 'Promotors', ['en-GB', 'sv-SE'])
>>> q6_net = qp.net(q6_net, [9, 10], {'en-GB': 'Detractors',
...                                   'sv_SE': 'Detractors',
...                                   'de-DE': 'Kritiker'})
>>> qp.net(q6_net[0], text='Promoter', text_key='de-DE')

>>> print q6_net
[
    {
        "1": [1, 2, 3, 4, 5, 6],
        "text": {
            "en-GB": "Promotors",
            "sv-SE": "Promotors",
            "de-DE": "Promoter"
        }
    },
    {
        "2": [9, 10],
        "text": {
            "en-GB": "Detractors",
            "sv_SE": "Detractors",
            "de-DE": "Kritiker"
        }
    }
]

Calculations¶

Stack.add_nets() has the parameter calc, which allows adding Views that are calculated out of the defined nets. The method qp.calc() is a helper to create a well-formatted instruction dict for the calculation. For instance, to calculate the NPS (promoters - detractors) for 'q6', see the example above and create the following calculation:

>>> q6_calc = qp.calc((1, '-', 2), 'NPS', ['en-GB', 'sv-SE', 'de-DE'])

>>> print q6_calc
OrderedDict([('calc', ('net_1', <built-in function sub>, 'net_2')),
            ('calc_only', False),
            ('text', {'en-GB': 'NPS',
                      'sv-SE': 'NPS',
                      'de-DE': 'NPS'})])

>>> stack.add_nets('q6', q6_net, calc=q6_calc, _batches=b_name, verbose=False)

>>> stack.describe('view', 'x')
x                                                   q2b   q6  q6_1  q6_2  q6_3
view
x|f.c:f|x[{1,2,3,4,5,6}],x[{9,10}],x[{1,2,3,4,5...  NaN  1.0   1.0   1.0   1.0
x|f.c:f|x[{1,2,3,4,5,6}],x[{9,10}],x[{1,2,3,4,5...  NaN  1.0   1.0   1.0   1.0
x|f|:|y||c%                                         NaN  1.0   1.0   1.0   1.0
x|f|:|||counts                                      NaN  1.0   1.0   1.0   1.0
x|f|x:|||cbase                                      1.0  1.0   1.0   1.0   1.0
x|f|x[{1,2}+]*:|y||net                              1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}+]*:|||net                               1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}]:|y||net                                1.0  NaN   NaN   NaN   NaN
x|f|x[{1,2}]:|||net                                 1.0  NaN   NaN   NaN   NaN

You can see that nets that are added on arrays are also applied for all array items.

Cumulative sums¶

Cumulative sum Views can be added to a specified collection of xks of the Stack using stack.cumulative_sum(). These Views will always replace the regular counts and percentage Views:

>>> b_name = 'batch6'
>>> batch6 = dataset.add_batch(b_name)
>>> batch6.add_x(['q2b', 'q6'])
>>> stack = dataset.populate(b_name)
>>> stack.aggregate(views=['counts', 'c%', 'cbase'], batches=b_name, verbose=False)

>>> stack.cumulative_sum('q6', verbose=False)

>>> stack.describe('view', 'x')
x                             q2b   q6  q6_1  q6_2  q6_3
view
x|f.c:f|x++:|y||c%_cumsum     NaN  1.0   1.0   1.0   1.0
x|f.c:f|x++:|||counts_cumsum  NaN  1.0   1.0   1.0   1.0
x|f|:|y||c%                   1.0  NaN   NaN   NaN   NaN
x|f|:|||counts                1.0  NaN   NaN   NaN   NaN
x|f|x:|||cbase                1.0  1.0   1.0   1.0   1.0

Significance tests¶

>>> batch2 = dataset.get_batch('batch2')
>>> batch2.set_sigtests([0.05])
>>> batch5 = dataset.get_batch('batch5')
>>> batch5.set_sigtests([0.01, 0.05])
>>> stack = dataset.populate(['batch2', 'batch5'])

>>> stack.aggregate(['counts', 'cbase'], batches=['batch2', 'batch5'], verbose=False)

>>> stack.describe(['view', 'y'], 'x')
x                            q2b   q6  q6_1  q6_2  q6_3
view                 y
x|f|:||weight|counts @       1.0  NaN   NaN   NaN   NaN
                     gender  1.0  NaN   NaN   NaN   NaN
x|f|:|||counts       @       1.0  1.0   1.0   1.0   1.0
x|f|x:||weight|cbase @       1.0  NaN   NaN   NaN   NaN
                     gender  1.0  NaN   NaN   NaN   NaN
x|f|x:|||cbase       @       1.0  1.0   1.0   1.0   1.0
                     gender  1.0  NaN   NaN   NaN   NaN

Significance tests can only be added Batch-wise, which also means that significance levels must be defined for each Batch before running stack.add_tests().

>>> stack.add_tests(['batch2', 'batch5'], verbose=False)

>>> stack.describe(['view', 'y'], 'x')
x                                               q2b   q6  q6_1  q6_2  q6_3
view                                    y
x|f|:||weight|counts                    @       1.0  NaN   NaN   NaN   NaN
                                        gender  1.0  NaN   NaN   NaN   NaN
x|f|:|||counts                          @       1.0  1.0   1.0   1.0   1.0
x|f|x:||weight|cbase                    @       1.0  NaN   NaN   NaN   NaN
                                        gender  1.0  NaN   NaN   NaN   NaN
x|f|x:|||cbase                          @       1.0  1.0   1.0   1.0   1.0
                                        gender  1.0  NaN   NaN   NaN   NaN
x|t.props.Dim.01|:|||significance       @       1.0  NaN   1.0   1.0   1.0
x|t.props.Dim.05|:||weight|significance @       1.0  NaN   NaN   NaN   NaN
                                        gender  1.0  NaN   NaN   NaN   NaN
x|t.props.Dim.05|:|||significance       @       1.0  NaN   1.0   1.0   1.0