quantify.engine

class quantipy.Quantity(link, weight=None, base_all=False, ignore_flags=False)

The Quantity object is the main Quantipy aggregation engine.

Consists of a link’s data matrix representation and sectional defintion of weight vector (wv), x-codes section (xsect) and y-codes section (ysect). The instance methods handle creation, retrieval and manipulation of the data input matrices and section definitions as well as the majority of statistical calculations.

calc(expression, axis='x', result_only=False)

Compute (simple) aggregation level arithmetics.

count(axis=None, raw_sum=False, cum_sum=False, effective=False, margin=True, as_df=True)

Count entries over all cells or per axis margin.

Parameters:
  • axis ({None, 'x', 'y'}, deafult None) – When axis is None, the frequency of all cells from the uni- or multivariate distribution is presented. If the axis is specified to be either ‘x’ or ‘y’ the margin per axis becomes the resulting aggregation.
  • raw_sum (bool, default False) – If True will perform a simple summation over the cells given the axis parameter. This ignores net counting of qualifying answers in favour of summing over all answers given when considering margins.
  • cum_sum (bool, default False) – If True a cumulative sum of the elements along the given axis is returned.
  • effective (bool, default False) – If True, compute effective counts instead of traditional (weighted) counts.
  • margin (bool, deafult True) – Controls whether the margins of the aggregation result are shown. This also applies to margin aggregations themselves, since they contain a margin in (form of the total number of cases) as well.
  • as_df (bool, default True) – Controls whether the aggregation is transformed into a Quantipy- multiindexed (following the Question/Values convention) pandas.DataFrame or will be left in its numpy.array format.
Returns:

Passes a pandas.DataFrame or numpy.array of cell or margin counts to the result property.

Return type:

self

exclude(codes, axis='x')

Wrapper for _missingfy(…keep_codes=False, …, keep_base=False, …) Excludes specified codes from aggregation.

filter(condition, keep_base=True, inplace=False)

Use a Quantipy conditional expression to filter the data matrix entires.

group(groups, axis='x', expand=None, complete=False)

Build simple or logical net vectors, optionally keeping orginating codes.

Parameters:
  • groups (list, dict of lists or logic expression) –

    The group/net code defintion(s) in form of…

    • a simple list: [1, 2, 3]
    • a dict of list: {'grp A': [1, 2, 3], 'grp B': [4, 5, 6]}
    • a logical expression: not_any([1, 2])
  • axis ({'x', 'y'}, default 'x') – The axis to group codes on.
  • expand ({None, 'before', 'after'}, default None) – If 'before', the codes that are grouped will be kept and placed before the grouped aggregation; vice versa for 'after'. Ignored on logical expressions found in groups.
  • complete (bool, default False) – If True, codes that define the Link on the given axis but are not present in the groups defintion(s) will be placed in their natural position within the aggregation, respecting the value of expand.
Returns:

Return type:

None

limit(codes, axis='x')

Wrapper for _missingfy(…keep_codes=True, …, keep_base=True, …) Restrict the data matrix entires to contain the specified codes only.

normalize(on='y', per_cell=False)

Convert a raw cell count result to its percentage representation.

Parameters:
  • on ({'y', 'x', 'counts_sum', str}, default 'y') – Defines the base to normalize the result on. 'y' will produce column percentages, 'x' will produce row percentages. It is also possible to use another question’s frequencies to compute rebased percentages providing its name instead.
  • per_cell (bool, default False) – Compute percentages on a cell-per-cell basis, effectively treating each categorical row as a base figure on its own. Only possible if the on argument does not indidcate an axis result ('x', 'y', 'counts_sum'), but instead another variable’s name. The related xdef codes collection length must be identical for this for work, otherwise a ValueError is raised.
Returns:

Updates a count-based aggregation in the result property.

Return type:

self

rescale(scaling, drop=False)

Modify the object’s xdef property reflecting new value defintions.

Parameters:
  • scaling (dict) – Mapping of old_code: new_code, given as of type int or float.
  • drop (bool, default False) – If True, codes not included in the scaling dict will be excluded.
Returns:

Return type:

self

summarize(stat='summary', axis='x', margin=True, as_df=True)

Calculate distribution statistics across the given axis.

Parameters:
  • stat ({'summary', 'mean', 'median', 'var', 'stddev', 'sem', varcoeff',) – ‘min’, ‘lower_q’, ‘upper_q’, ‘max’}, default ‘summary’ The measure to calculate. Defaults to a summary output of the most important sample statistics.
  • axis ({'x', 'y'}, default 'x') – The axis which is reduced in the aggregation, e.g. column vs. row means.
  • margin (bool, default True) – Controls whether statistic(s) of the marginal distribution are shown.
  • as_df (bool, default True) – Controls whether the aggregation is transformed into a Quantipy- multiindexed (following the Question/Values convention) pandas.DataFrame or will be left in its numpy.array format.
Returns:

Passes a pandas.DataFrame or numpy.array of the descriptive (summary) statistic(s) to the result property.

Return type:

self

swap(var, axis='x', update_axis_def=True, inplace=True)

Change the Quantity’s x- or y-axis keeping filter and weight setup.

All edits and aggregation results will be removed during the swap.

Parameters:
  • var (str) – New variable’s name used in axis swap.
  • axis ({‘x’, ‘y’}, default 'x') – The axis to swap.
  • update_axis_def (bool, default False) – If self is of type 'array', the name and item definitions (that are e.g. used in the to_df() method) can be updated to reflect the swapped axis variable or kept to show the original’s ones.
  • inplace (bool, default True) – Whether to modify the Quantity inplace or return a new instance.
Returns:

swapped

Return type:

New Quantity instance with exchanged x- or y-axis.

unweight()

Remove any weighting by dividing the matrix by itself.

weight()

Weight by multiplying the indicator entries with the weight vector.

class quantipy.Test(link, view_name_notation, test_total=False)

The Quantipy Test object is a defined by a Link and the view name notation string of a counts or means view. All auxiliary figures needed to arrive at the test results are computed inside the instance of the object.

get_se()

Compute the standard error (se) estimate of the tested metric.

The calculation of the se is defined by the parameters of the setup. The main difference is the handling of variances. unpooled implicitly assumes variance inhomogenity between the column pairing’s samples. pooled treats variances effectively as equal.

get_sig()

TODO: implement returning tstats only.

get_statistic()

Returns the test statistic of the algorithm.

run()

Performs the testing algorithm and creates an output pd.DataFrame.

The output is indexed according to Quantipy’s Questions->Values convention. Significant results between columns are presented as lists of integer y-axis codes where the column with the higher value is holding the codes of the columns with the lower values. NaN is indicating that a cell is not holding any sig. higher values compared to the others.

set_params(test_total=False, level='mid', mimic='Dim', testtype='pooled', use_ebase=True, ovlp_correc=True, cwi_filter=False, flag_bases=None)

Sets the test algorithm parameters and defines the type of test.

This method sets the test’s global parameters and derives the necessary measures for the computation of the test statistic. The default values correspond to the SPSS Dimensions Column Tests algorithms that control for bias introduced by weighting and overlapping samples in the column pairs of multi-coded questions.

Note

The Dimensions implementation uses variance pooling.

Parameters:
  • test_total (bool, default False) – If set to True, the test algorithms will also include an existent total (@-) version of the original link and test against the unconditial data distribution.
  • level (str or float, default 'mid') – The level of significance given either as per ‘low’ = 0.1, ‘mid’ = 0.05, ‘high’ = 0.01 or as specific float, e.g. 0.15.
  • mimic ({'askia', 'Dim'} default='Dim') – Will instruct the mimicking of a software specific test.
  • testtype (str, default 'pooled') – Global definition of the tests.
  • use_ebase (bool, default True) – If True, will use the effective sample sizes instead of the the simple weighted ones when testing a weighted aggregation.
  • ovlp_correc (bool, default True) – If True, will consider and correct for respondent overlap when testing between multi-coded column pairs.
  • cwi_filter (bool, default False) – If True, will check an incoming count aggregation for cells that fall below a treshhold comparison aggregation that assumes counts to be independent.
  • flag_bases (list of two int, default None) – If provided, the output dataframe will replace results that have been calculated on (eff.) bases below the first int with '**' and mark results in columns with bases below the second int with '*'
Returns:

Return type:

self