Stack

class quantipy.Stack(name='', add_data=None)

Container of quantipy.Link objects holding View objects.

A Stack is nested dictionary that structures the data and variable relationships storing all View aggregations performed.

add_data(data_key, data=None, meta=None)

Sets the data_key into the stack, optionally mapping data sources it.

It is possible to handle the mapping of data sources in different ways:

  • no meta or data (for proxy links not connected to source data)
  • meta only (for proxy links with supporintg meta)
  • data only (meta will be inferred if possible)
  • data and meta
Parameters:
  • data_key (str) – The reference name for a data source connected to the Stack.
  • data (pandas.DataFrame) – The input (case) data source.
  • meta (dict or OrderedDict) – A quantipy compatible metadata source that describes the case data.
Returns:

Return type:

None

Add Link and View defintions to the Stack.

The method can be used flexibly: It is possible to pass only Link defintions that might be composed of filter, x and y specifications, only views incl. weight variable selections or arbitrary combinations of the former.

TODO:

Remove variables from parameter list and method calls.

Parameters:
  • data_keys (str, optional) – The data_key to be added to. If none is given, the method will try to add to all data_keys found in the Stack.
  • filters (list of str describing filter defintions, default ['no_filter']) – The string must be a valid input for the pandas.DataFrame.query() method.
  • y (x,) – The x and y variables to constrcut Links from.
  • views (list of view method names.) – Can be any of Quantipy’s preset Views or the names of created view method specifications.
  • weights (list, optional) – The names of weight variables to consider in the data aggregation process. Weight variables must be of type float.
Returns:

Return type:

None

add_nets(on_vars, net_map, expand=None, calc=None, rebase=None, text_prefix='Net:', checking_cluster=None, _batches='all', recode='auto', mis_in_rec=False, verbose=True)

Add a net-like view to a specified collection of x keys of the stack.

Parameters:
  • on_vars (list) – The list of x variables to add the view to.
  • net_map (list of dicts) –

    The listed dicts must map the net/band text label to lists of categorical answer codes to group together, e.g.:

    >>> [{'Top3': [1, 2, 3]},
    ...  {'Bottom3': [4, 5, 6]}]
    It is also possible to provide enumerated net definition dictionaries
    that are explicitly setting ``text`` metadata per ``text_key`` entries:
    
    >>> [{1: [1, 2], 'text': {'en-GB': 'UK NET TEXT',
    ...                       'da-DK': 'DK NET TEXT',
    ...                       'de-DE': 'DE NET TEXT'}}]
    
  • expand ({'before', 'after'}, default None) – If provided, the view will list the net-defining codes after or before the computed net groups (i.e. “overcode” nets).
  • calc (dict, default None) –

    A dictionary that is attaching a text label to a calculation expression using the the net definitions. The nets are referenced as per ‘net_1’, ‘net_2’, ‘net_3’, … . Supported calculation expressions are add, sub, div, mul. Example:

    >>> {'calc': ('net_1', add, 'net_2'), 'text': {'en-GB': 'UK CALC LAB',
    ...                                            'da-DK': 'DA CALC LAB',
    ...                                            'de-DE': 'DE CALC LAB'}}
    
  • rebase (str, default None) – Use another variables margin’s value vector for column percentage computation.
  • text_prefix (str, default 'Net:') – By default each code grouping/net will have its text label prefixed with ‘Net: ‘. Toggle by passing None (or an empty str, ‘’).
  • checking_cluster (quantipy.Cluster, default None) – When provided, an automated checking aggregation will be added to the Cluster instance.
  • _batches (str or list of str) – Only for qp.Links that are defined in this qp.Batch instances views are added.
  • recode ({‘extend_codes’, ‘drop_codes’, ‘collect_codes’, ‘collect_codes@cat_name’},) – default ‘auto’ Adds variable with nets as codes to DataSet/Stack. If ‘extend_codes’, codes are extended with nets. If ‘drop_codes’, new variable only contains nets as codes. If ‘collect_codes’ or ‘collect_codes@cat_name’ the variable contains nets and another category that summarises all codes which are not included in any net. If no cat_name is provided, ‘Other’ is taken as default
  • mis_in_rec (bool, default False) – Skip or include codes that are defined as missing when recoding from net definition.
Returns:

The stack instance is modified inplace.

Return type:

None

add_stats(on_vars, stats=['mean'], other_source=None, rescale=None, drop=True, exclude=None, factor_labels=True, custom_text=None, checking_cluster=None, _batches='all', recode=False, verbose=True)

Add a descriptives view to a specified collection of xks of the stack.

Valid descriptives views: {‘mean’, ‘stddev’, ‘min’, ‘max’, ‘median’, ‘sem’}

Parameters:
  • on_vars (list) – The list of x variables to add the view to.
  • stats (list of str, default ['mean']) – The metrics to compute and add as a view.
  • other_source (str) – If provided the Link’s x-axis variable will be swapped with the (numerical) variable provided. This can be used to attach statistics of a different variable to a Link definition.
  • rescale (dict) – A dict that maps old to new codes, e.g. {1: 5, 2: 4, 3: 3, 4: 2, 5: 1}
  • drop (bool, default True) – If rescale is provided all codes that are not mapped will be ignored in the computation.
  • exclude (list) – Codes/values to ignore in the computation.
  • factor_labels (bool / str, default True) – Writes the (rescaled) factor values next to the category text label. If True, square-brackets are used. If ‘()’, normal brackets are used.
  • custom_text (str, default None) – A custom string affix to put at the end of the requested statistics’ names.
  • checking_cluster (quantipy.Cluster, default None) – When provided, an automated checking aggregation will be added to the Cluster instance.
  • _batches (str or list of str) – Only for qp.Links that are defined in this qp.Batch instances views are added.
  • recode (bool, default False) – Create a new variable that contains only the values which are needed for the stat computation. The values and the included data will be rescaled.
Returns:

The stack instance is modified inplace.

Return type:

None

add_tests(_batches='all', verbose=True)

Apply coltests for selected batches.

Sig. Levels are taken from qp.Batch definitions.

Parameters:_batches (str or list of str) – Only for qp.Links that are defined in this qp.Batch instances views are added.
Returns:
Return type:None
aggregate(views, unweighted_base=True, categorize=[], batches='all', xs=None, bases={}, verbose=True)

Add views to all defined qp.Link in qp.Stack.

Parameters:
  • views (str or list of str or qp.ViewMapper) – views that are added.
  • unweighted_base (bool, default True) – If True, unweighted ‘cbase’ is added to all non-arrays. This parameter will be deprecated in future, please use bases instead.
  • categorize (str or list of str) – Determines how numerical data is handled: If provided, the variables will get counts and percentage aggregations ('counts', 'c%') alongside the 'cbase' view. If False, only 'cbase' views are generated for non-categorical types.
  • batches (str/ list of str, default 'all') – Name(s) of qp.Batch instance(s) that are used to aggregate the qp.Stack.
  • xs (list of str) – Names of variable, for which views are added.
  • bases (dict) – Defines which bases should be aggregated, weighted or unweighted.
Returns:

Return type:

None, modify qp.Stack inplace

apply_meta_edits(batch_name, data_key, filter_key=None, freeze=False)

Take over meta_edits from Batch definitions.

Parameters:
  • batch_name (str) – Name of the Batch whose meta_edits are taken.
  • data_key (str) – Accessing this metadata: self[data_key].meta Batch definitions are takes from here and this metadata is modified.
  • filter_key (str, default None) – Currently not implemented! Accessing this metadata: self[data_key][filter_key].meta Batch definitions are takes from here and this metadata is modified.
cumulative_sum(on_vars, _batches='all', verbose=True)

Add cumulative sum view to a specified collection of xks of the stack.

Parameters:
  • on_vars (list) – The list of x variables to add the view to.
  • _batches (str or list of str) – Only for qp.Links that are defined in this qp.Batch instances views are added.
Returns:

The stack instance is modified inplace.

Return type:

None

describe(index=None, columns=None, query=None, split_view_names=False)

Generates a structured overview of all Link defining Stack elements.

Parameters:
  • columns (index,) – optional Controls the output representation by structuring a pivot-style table according to the index and column values.
  • query (str) – A query string that is valid for the pandas.DataFrame.query() method.
  • split_view_names (bool, default False) – If True, will create an output of unique view name notations split up into their components.
Returns:

description – DataFrame summing the Stack’s structure in terms of Links and Views.

Return type:

pandas.DataFrame

freeze_master_meta(data_key, filter_key=None)

Save .meta in .master_meta for a defined data_key.

Parameters:
  • data_key (str) – Using: self[data_key]
  • filter_key (str, default None) – Currently not implemented! Using: self[data_key][filter_key]
static from_sav(data_key, filename, name=None, path=None, ioLocale='en_US.UTF-8', ioUtf8=True)

Creates a new stack instance from a .sav file.

Parameters:
  • data_key (str) – The data_key for the data and meta in the sav file.
  • filename (str) – The name to the sav file.
  • name (str) – A name for the sav (stored in the meta).
  • path (str) – The path to the sav file.
  • ioLocale (str) – The locale used in during the sav processing.
  • ioUtf8 (bool) – Boolean that indicates the mode in which text communicated to or from the I/O module will be.
Returns:

stack – A stack instance that has a data_key with data and metadata to run aggregations.

Return type:

stack object instance

static load(path_stack, compression='gzip', load_cache=False)

Load Stack instance from .stack file.

Parameters:
  • path_stack (str) – The full path to the .stack file that should be created, including the extension.
  • compression ({'gzip'}, default 'gzip') – The compression type that has been used saving the file.
  • load_cache (bool, default False) – Loads MatrixCache into the Stack a .cache file is found.
Returns:

Return type:

None

static recode_from_net_def(dataset, on_vars, net_map, expand, recode='auto', text_prefix='Net:', mis_in_rec=False, verbose=True)

Create variables from net definitions.

reduce(data_keys=None, filters=None, x=None, y=None, variables=None, views=None)

Remove keys from the matching levels, erasing discrete Stack portions.

Parameters:filters, x, y, views (data_keys,) –
Returns:
Return type:None
refresh(data_key, new_data_key='', new_weight=None, new_data=None, new_meta=None)

Re-run all or a portion of Stack’s aggregations for a given data key.

refresh() can be used to re-weight the data using a new case data weight variable or to re-run all aggregations based on a changed source data version (e.g. after cleaning the file/ dropping cases) or a combination of the both.

Note

Currently this is only supported for the preset QuantipyViews(), namely: 'cbase', 'rbase', 'counts', 'c%', 'r%', 'mean', 'ebase'.

Parameters:
  • data_key (str) – The Links’ data key to be modified.
  • new_data_key (str, default '') – Controls if the existing data key’s files and aggregations will be overwritten or stored via a new data key.
  • new_weight (str) – The name of a new weight variable used to re-aggregate the Links.
  • new_data (pandas.DataFrame) – The case data source. If None is given, the original case data found for the data key will be used.
  • new_meta (quantipy meta document) – A meta data source associated with the case data. If None is given, the original meta definition found for the data key will be used.
Returns:

Return type:

None

remove_data(data_keys)

Deletes the data_key(s) and associated data specified in the Stack.

Parameters:data_keys (str or list of str) – The data keys to remove.
Returns:
Return type:None
restore_meta(data_key, filter_key=None)

Restore the .master_meta for a defined data_key if it exists.

Undo self.apply_meta_edits()

Parameters:
  • data_key (str) – Accessing this metadata: self[data_key].meta
  • filter_key (str, default None) – Currently not implemented! Accessing this metadata: self[data_key][filter_key].meta
save(path_stack, compression='gzip', store_cache=True, decode_str=False, dataset=False, describe=False)

Save Stack instance to .stack file.

Parameters:
  • path_stack (str) – The full path to the .stack file that should be created, including the extension.
  • compression ({'gzip'}, default 'gzip') – The intended compression type.
  • store_cache (bool, default True) – Stores the MatrixCache in a file in the same location.
  • decode_str (bool, default=True) – If True the unicoder function will be used to decode all str objects found anywhere in the meta document/s.
  • dataset (bool, default=False) – If True a json/csv will be saved parallel to the saved stack for each data key in the stack.
  • describe (bool, default=False) – If True the result of stack.describe().to_excel() will be saved parallel to the saved stack.
Returns:

Return type:

None

variable_types(data_key, only_type=None, verbose=True)

Group variables by data types found in the meta.

Parameters:
  • data_key (str) – The reference name of a case data source hold by the Stack instance.
  • only_type ({'int', 'float', 'single', 'delimited set', 'string',) – ‘date’, time’, ‘array’}, optional Will restrict the output to the given data type.
Returns:

types – A summary of variable names mapped to their data types, in form of {type_name: [variable names]} or a list of variable names confirming only_type.

Return type:

dict or list of str