Stack¶

class quantipy.Stack(name='', add_data=None)¶

Container of quantipy.Link objects holding View objects.

A Stack is nested dictionary that structures the data and variable relationships storing all View aggregations performed.

add_data(data_key, data=None, meta=None)¶

Sets the data_key into the stack, optionally mapping data sources it.

It is possible to handle the mapping of data sources in different ways:

no meta or data (for proxy links not connected to source data)
meta only (for proxy links with supporintg meta)
data only (meta will be inferred if possible)
data and meta

Parameters:	data_key (str) – The reference name for a data source connected to the Stack. data (pandas.DataFrame) – The input (case) data source. meta (dict or OrderedDict) – A quantipy compatible metadata source that describes the case data.
Returns:
Return type:	None

add_link(data_keys=None, filters=['no_filter'], x=None, y=None, views=None, weights=None, variables=None)¶

Add Link and View defintions to the Stack.

The method can be used flexibly: It is possible to pass only Link defintions that might be composed of filter, x and y specifications, only views incl. weight variable selections or arbitrary combinations of the former.

TODO:	Remove `variables` from parameter list and method calls.
Parameters:	data_keys (str, optional) – The data_key to be added to. If none is given, the method will try to add to all data_keys found in the Stack. filters (list of str describing filter defintions, default ['no_filter']) – The string must be a valid input for the pandas.DataFrame.query() method. y (x,) – The x and y variables to constrcut Links from. views (list of view method names.) – Can be any of Quantipy’s preset Views or the names of created view method specifications. weights (list, optional) – The names of weight variables to consider in the data aggregation process. Weight variables must be of type `float`.
Returns:
Return type:	None

add_nets(on_vars, net_map, expand=None, calc=None, rebase=None, text_prefix='Net:', checking_cluster=None, _batches='all', recode='auto', mis_in_rec=False, verbose=True)¶

Add a net-like view to a specified collection of x keys of the stack.

Parameters:	on_vars (list) – The list of x variables to add the view to. net_map (list of dicts) – The listed dicts must map the net/band text label to lists of categorical answer codes to group together, e.g.: >>> [{'Top3': [1, 2, 3]}, ... {'Bottom3': [4, 5, 6]}] It is also possible to provide enumerated net definition dictionaries that are explicitly setting ``text`` metadata per ``text_key`` entries: >>> [{1: [1, 2], 'text': {'en-GB': 'UK NET TEXT', ... 'da-DK': 'DK NET TEXT', ... 'de-DE': 'DE NET TEXT'}}] expand ({'before', 'after'}, default None) – If provided, the view will list the net-defining codes after or before the computed net groups (i.e. “overcode” nets). calc (dict, default None) – A dictionary that is attaching a text label to a calculation expression using the the net definitions. The nets are referenced as per ‘net_1’, ‘net_2’, ‘net_3’, … . Supported calculation expressions are add, sub, div, mul. Example: >>> {'calc': ('net_1', add, 'net_2'), 'text': {'en-GB': 'UK CALC LAB', ... 'da-DK': 'DA CALC LAB', ... 'de-DE': 'DE CALC LAB'}} rebase (str, default None) – Use another variables margin’s value vector for column percentage computation. text_prefix (str, default 'Net:') – By default each code grouping/net will have its `text` label prefixed with ‘Net: ‘. Toggle by passing None (or an empty str, ‘’). checking_cluster (quantipy.Cluster, default None) – When provided, an automated checking aggregation will be added to the `Cluster` instance. _batches (str or list of str) – Only for `qp.Links` that are defined in this `qp.Batch` instances views are added. recode ({‘extend_codes’, ‘drop_codes’, ‘collect_codes’, ‘collect_codes@cat_name’},) – default ‘auto’ Adds variable with nets as codes to DataSet/Stack. If ‘extend_codes’, codes are extended with nets. If ‘drop_codes’, new variable only contains nets as codes. If ‘collect_codes’ or ‘collect_codes@cat_name’ the variable contains nets and another category that summarises all codes which are not included in any net. If no cat_name is provided, ‘Other’ is taken as default mis_in_rec (bool, default False) – Skip or include codes that are defined as missing when recoding from net definition.
Returns:	The stack instance is modified inplace.
Return type:	None

add_stats(on_vars, stats=['mean'], other_source=None, rescale=None, drop=True, exclude=None, factor_labels=True, custom_text=None, checking_cluster=None, _batches='all', recode=False, verbose=True)¶

Add a descriptives view to a specified collection of xks of the stack.

Valid descriptives views: {‘mean’, ‘stddev’, ‘min’, ‘max’, ‘median’, ‘sem’}

Parameters:	on_vars (list) – The list of x variables to add the view to. stats (list of str, default `['mean']`) – The metrics to compute and add as a view. other_source (str) – If provided the Link’s x-axis variable will be swapped with the (numerical) variable provided. This can be used to attach statistics of a different variable to a Link definition. rescale (dict) – A dict that maps old to new codes, e.g. {1: 5, 2: 4, 3: 3, 4: 2, 5: 1} drop (bool, default True) – If `rescale` is provided all codes that are not mapped will be ignored in the computation. exclude (list) – Codes/values to ignore in the computation. factor_labels (bool / str, default True) – Writes the (rescaled) factor values next to the category text label. If True, square-brackets are used. If ‘()’, normal brackets are used. custom_text (str, default None) – A custom string affix to put at the end of the requested statistics’ names. checking_cluster (quantipy.Cluster, default None) – When provided, an automated checking aggregation will be added to the `Cluster` instance. _batches (str or list of str) – Only for `qp.Links` that are defined in this `qp.Batch` instances views are added. recode (bool, default False) – Create a new variable that contains only the values which are needed for the stat computation. The values and the included data will be rescaled.
Returns:	The stack instance is modified inplace.
Return type:	None

add_tests(_batches='all', verbose=True)¶

Apply coltests for selected batches.

Sig. Levels are taken from qp.Batch definitions.

Parameters:	_batches (str or list of str) – Only for `qp.Links` that are defined in this `qp.Batch` instances views are added.
Returns:
Return type:	None

aggregate(views, unweighted_base=True, categorize=[], batches='all', xs=None, bases={}, verbose=True)¶

Add views to all defined qp.Link in qp.Stack.

Parameters:	views (str or list of str or qp.ViewMapper) – `views` that are added. unweighted_base (bool, default True) – If True, unweighted ‘cbase’ is added to all non-arrays. This parameter will be deprecated in future, please use bases instead. categorize (str or list of str) – Determines how numerical data is handled: If provided, the variables will get counts and percentage aggregations (`'counts'`, `'c%'`) alongside the `'cbase'` view. If False, only `'cbase'` views are generated for non-categorical types. batches (str/ list of str, default 'all') – Name(s) of `qp.Batch` instance(s) that are used to aggregate the `qp.Stack`. xs (list of str) – Names of variable, for which views are added. bases (dict) – Defines which bases should be aggregated, weighted or unweighted.
Returns:
Return type:	None, modify `qp.Stack` inplace

apply_meta_edits(batch_name, data_key, filter_key=None, freeze=False)¶

Take over meta_edits from Batch definitions.

Parameters:

batch_name (str) – Name of the Batch whose meta_edits are taken.
data_key (str) – Accessing this metadata: self[data_key].meta Batch definitions are takes from here and this metadata is modified.
filter_key (str, default None) – Currently not implemented! Accessing this metadata: self[data_key][filter_key].meta Batch definitions are takes from here and this metadata is modified.

cumulative_sum(on_vars, _batches='all', verbose=True)¶

Add cumulative sum view to a specified collection of xks of the stack.

Parameters:	on_vars (list) – The list of x variables to add the view to. _batches (str or list of str) – Only for `qp.Links` that are defined in this `qp.Batch` instances views are added.
Returns:	The stack instance is modified inplace.
Return type:	None

describe(index=None, columns=None, query=None, split_view_names=False)¶

Generates a structured overview of all Link defining Stack elements.

Parameters:	columns (index,) – optional Controls the output representation by structuring a pivot-style table according to the index and column values. query (str) – A query string that is valid for the pandas.DataFrame.query() method. split_view_names (bool, default False) – If True, will create an output of unique view name notations split up into their components.
Returns:	description – DataFrame summing the Stack’s structure in terms of Links and Views.
Return type:	pandas.DataFrame

freeze_master_meta(data_key, filter_key=None)¶

Save .meta in .master_meta for a defined data_key.

Parameters:	data_key (str) – Using: `self[data_key]` filter_key (str, default None) – Currently not implemented! Using: `self[data_key][filter_key]`

static from_sav(data_key, filename, name=None, path=None, ioLocale='en_US.UTF-8', ioUtf8=True)¶

Creates a new stack instance from a .sav file.

Parameters:	data_key (str) – The data_key for the data and meta in the sav file. filename (str) – The name to the sav file. name (str) – A name for the sav (stored in the meta). path (str) – The path to the sav file. ioLocale (str) – The locale used in during the sav processing. ioUtf8 (bool) – Boolean that indicates the mode in which text communicated to or from the I/O module will be.
Returns:	stack – A stack instance that has a data_key with data and metadata to run aggregations.
Return type:	stack object instance

static load(path_stack, compression='gzip', load_cache=False)¶

Load Stack instance from .stack file.

Parameters:	path_stack (str) – The full path to the .stack file that should be created, including the extension. compression ({'gzip'}, default 'gzip') – The compression type that has been used saving the file. load_cache (bool, default False) – Loads MatrixCache into the Stack a .cache file is found.
Returns:
Return type:	None

static recode_from_net_def(dataset, on_vars, net_map, expand, recode='auto', text_prefix='Net:', mis_in_rec=False, verbose=True)¶: Create variables from net definitions.

reduce(data_keys=None, filters=None, x=None, y=None, variables=None, views=None)¶

Remove keys from the matching levels, erasing discrete Stack portions.

Parameters:	filters, x, y, views (data_keys,) –
Returns:
Return type:	None

refresh(data_key, new_data_key='', new_weight=None, new_data=None, new_meta=None)¶

Re-run all or a portion of Stack’s aggregations for a given data key.

refresh() can be used to re-weight the data using a new case data weight variable or to re-run all aggregations based on a changed source data version (e.g. after cleaning the file/ dropping cases) or a combination of the both.

Note

Currently this is only supported for the preset QuantipyViews(), namely: 'cbase', 'rbase', 'counts', 'c%', 'r%', 'mean', 'ebase'.

Parameters:	data_key (str) – The Links’ data key to be modified. new_data_key (str, default '') – Controls if the existing data key’s files and aggregations will be overwritten or stored via a new data key. new_weight (str) – The name of a new weight variable used to re-aggregate the Links. new_data (pandas.DataFrame) – The case data source. If None is given, the original case data found for the data key will be used. new_meta (quantipy meta document) – A meta data source associated with the case data. If None is given, the original meta definition found for the data key will be used.
Returns:
Return type:	None

remove_data(data_keys)¶

Deletes the data_key(s) and associated data specified in the Stack.

Parameters:	data_keys (str or list of str) – The data keys to remove.
Returns:
Return type:	None

restore_meta(data_key, filter_key=None)¶

Restore the .master_meta for a defined data_key if it exists.

Undo self.apply_meta_edits()

Parameters:	data_key (str) – Accessing this metadata: `self[data_key].meta` filter_key (str, default None) – Currently not implemented! Accessing this metadata: `self[data_key][filter_key].meta`

save(path_stack, compression='gzip', store_cache=True, decode_str=False, dataset=False, describe=False)¶

Save Stack instance to .stack file.

Parameters:	path_stack (str) – The full path to the .stack file that should be created, including the extension. compression ({'gzip'}, default 'gzip') – The intended compression type. store_cache (bool, default True) – Stores the MatrixCache in a file in the same location. decode_str (bool, default=True) – If True the unicoder function will be used to decode all str objects found anywhere in the meta document/s. dataset (bool, default=False) – If True a json/csv will be saved parallel to the saved stack for each data key in the stack. describe (bool, default=False) – If True the result of stack.describe().to_excel() will be saved parallel to the saved stack.
Returns:
Return type:	None

variable_types(data_key, only_type=None, verbose=True)¶

Group variables by data types found in the meta.

Parameters:	data_key (str) – The reference name of a case data source hold by the Stack instance. only_type ({'int', 'float', 'single', 'delimited set', 'string',) – ‘date’, time’, ‘array’}, optional Will restrict the output to the given data type.
Returns:	types – A summary of variable names mapped to their data types, in form of {type_name: [variable names]} or a list of variable names confirming only_type.
Return type:	dict or list of str