Stack¶
-
class
quantipy.
Stack
(name='', add_data=None)¶ Container of quantipy.Link objects holding View objects.
A Stack is nested dictionary that structures the data and variable relationships storing all View aggregations performed.
-
add_data
(data_key, data=None, meta=None)¶ Sets the data_key into the stack, optionally mapping data sources it.
It is possible to handle the mapping of data sources in different ways:
- no meta or data (for proxy links not connected to source data)
- meta only (for proxy links with supporintg meta)
- data only (meta will be inferred if possible)
- data and meta
Parameters: - data_key (str) – The reference name for a data source connected to the Stack.
- data (pandas.DataFrame) – The input (case) data source.
- meta (dict or OrderedDict) – A quantipy compatible metadata source that describes the case data.
Returns: Return type: None
-
add_link
(data_keys=None, filters=['no_filter'], x=None, y=None, views=None, weights=None, variables=None)¶ Add Link and View defintions to the Stack.
The method can be used flexibly: It is possible to pass only Link defintions that might be composed of filter, x and y specifications, only views incl. weight variable selections or arbitrary combinations of the former.
TODO: Remove
variables
from parameter list and method calls.Parameters: - data_keys (str, optional) – The data_key to be added to. If none is given, the method will try to add to all data_keys found in the Stack.
- filters (list of str describing filter defintions, default ['no_filter']) – The string must be a valid input for the pandas.DataFrame.query() method.
- y (x,) – The x and y variables to constrcut Links from.
- views (list of view method names.) – Can be any of Quantipy’s preset Views or the names of created view method specifications.
- weights (list, optional) – The names of weight variables to consider in the data aggregation
process. Weight variables must be of type
float
.
Returns: Return type: None
-
add_nets
(on_vars, net_map, expand=None, calc=None, rebase=None, text_prefix='Net:', checking_cluster=None, _batches='all', recode='auto', mis_in_rec=False, verbose=True)¶ Add a net-like view to a specified collection of x keys of the stack.
Parameters: - on_vars (list) – The list of x variables to add the view to.
- net_map (list of dicts) –
The listed dicts must map the net/band text label to lists of categorical answer codes to group together, e.g.:
>>> [{'Top3': [1, 2, 3]}, ... {'Bottom3': [4, 5, 6]}] It is also possible to provide enumerated net definition dictionaries that are explicitly setting ``text`` metadata per ``text_key`` entries:
>>> [{1: [1, 2], 'text': {'en-GB': 'UK NET TEXT', ... 'da-DK': 'DK NET TEXT', ... 'de-DE': 'DE NET TEXT'}}]
- expand ({'before', 'after'}, default None) – If provided, the view will list the net-defining codes after or before the computed net groups (i.e. “overcode” nets).
- calc (dict, default None) –
A dictionary that is attaching a text label to a calculation expression using the the net definitions. The nets are referenced as per ‘net_1’, ‘net_2’, ‘net_3’, … . Supported calculation expressions are add, sub, div, mul. Example:
>>> {'calc': ('net_1', add, 'net_2'), 'text': {'en-GB': 'UK CALC LAB', ... 'da-DK': 'DA CALC LAB', ... 'de-DE': 'DE CALC LAB'}}
- rebase (str, default None) – Use another variables margin’s value vector for column percentage computation.
- text_prefix (str, default 'Net:') – By default each code grouping/net will have its
text
label prefixed with ‘Net: ‘. Toggle by passing None (or an empty str, ‘’). - checking_cluster (quantipy.Cluster, default None) – When provided, an automated checking aggregation will be added to the
Cluster
instance. - _batches (str or list of str) – Only for
qp.Links
that are defined in thisqp.Batch
instances views are added. - recode ({‘extend_codes’, ‘drop_codes’, ‘collect_codes’, ‘collect_codes@cat_name’},) – default ‘auto’ Adds variable with nets as codes to DataSet/Stack. If ‘extend_codes’, codes are extended with nets. If ‘drop_codes’, new variable only contains nets as codes. If ‘collect_codes’ or ‘collect_codes@cat_name’ the variable contains nets and another category that summarises all codes which are not included in any net. If no cat_name is provided, ‘Other’ is taken as default
- mis_in_rec (bool, default False) – Skip or include codes that are defined as missing when recoding from net definition.
Returns: The stack instance is modified inplace.
Return type: None
-
add_stats
(on_vars, stats=['mean'], other_source=None, rescale=None, drop=True, exclude=None, factor_labels=True, custom_text=None, checking_cluster=None, _batches='all', recode=False, verbose=True)¶ Add a descriptives view to a specified collection of xks of the stack.
Valid descriptives views: {‘mean’, ‘stddev’, ‘min’, ‘max’, ‘median’, ‘sem’}
Parameters: - on_vars (list) – The list of x variables to add the view to.
- stats (list of str, default
['mean']
) – The metrics to compute and add as a view. - other_source (str) – If provided the Link’s x-axis variable will be swapped with the (numerical) variable provided. This can be used to attach statistics of a different variable to a Link definition.
- rescale (dict) – A dict that maps old to new codes, e.g. {1: 5, 2: 4, 3: 3, 4: 2, 5: 1}
- drop (bool, default True) – If
rescale
is provided all codes that are not mapped will be ignored in the computation. - exclude (list) – Codes/values to ignore in the computation.
- factor_labels (bool / str, default True) – Writes the (rescaled) factor values next to the category text label. If True, square-brackets are used. If ‘()’, normal brackets are used.
- custom_text (str, default None) – A custom string affix to put at the end of the requested statistics’ names.
- checking_cluster (quantipy.Cluster, default None) – When provided, an automated checking aggregation will be added to the
Cluster
instance. - _batches (str or list of str) – Only for
qp.Links
that are defined in thisqp.Batch
instances views are added. - recode (bool, default False) – Create a new variable that contains only the values which are needed for the stat computation. The values and the included data will be rescaled.
Returns: The stack instance is modified inplace.
Return type: None
-
add_tests
(_batches='all', verbose=True)¶ Apply coltests for selected batches.
Sig. Levels are taken from
qp.Batch
definitions.Parameters: _batches (str or list of str) – Only for qp.Links
that are defined in thisqp.Batch
instances views are added.Returns: Return type: None
-
aggregate
(views, unweighted_base=True, categorize=[], batches='all', xs=None, bases={}, verbose=True)¶ Add views to all defined
qp.Link
inqp.Stack
.Parameters: - views (str or list of str or qp.ViewMapper) –
views
that are added. - unweighted_base (bool, default True) – If True, unweighted ‘cbase’ is added to all non-arrays. This parameter will be deprecated in future, please use bases instead.
- categorize (str or list of str) – Determines how numerical data is handled: If provided, the
variables will get counts and percentage aggregations
(
'counts'
,'c%'
) alongside the'cbase'
view. If False, only'cbase'
views are generated for non-categorical types. - batches (str/ list of str, default 'all') – Name(s) of
qp.Batch
instance(s) that are used to aggregate theqp.Stack
. - xs (list of str) – Names of variable, for which views are added.
- bases (dict) – Defines which bases should be aggregated, weighted or unweighted.
Returns: Return type: None, modify
qp.Stack
inplace- views (str or list of str or qp.ViewMapper) –
-
apply_meta_edits
(batch_name, data_key, filter_key=None, freeze=False)¶ Take over meta_edits from Batch definitions.
Parameters: - batch_name (str) – Name of the Batch whose meta_edits are taken.
- data_key (str) – Accessing this metadata:
self[data_key].meta
Batch definitions are takes from here and this metadata is modified. - filter_key (str, default None) – Currently not implemented!
Accessing this metadata:
self[data_key][filter_key].meta
Batch definitions are takes from here and this metadata is modified.
-
cumulative_sum
(on_vars, _batches='all', verbose=True)¶ Add cumulative sum view to a specified collection of xks of the stack.
Parameters: - on_vars (list) – The list of x variables to add the view to.
- _batches (str or list of str) – Only for
qp.Links
that are defined in thisqp.Batch
instances views are added.
Returns: The stack instance is modified inplace.
Return type: None
-
describe
(index=None, columns=None, query=None, split_view_names=False)¶ Generates a structured overview of all Link defining Stack elements.
Parameters: - columns (index,) – optional Controls the output representation by structuring a pivot-style table according to the index and column values.
- query (str) – A query string that is valid for the pandas.DataFrame.query() method.
- split_view_names (bool, default False) – If True, will create an output of unique view name notations split up into their components.
Returns: description – DataFrame summing the Stack’s structure in terms of Links and Views.
Return type: pandas.DataFrame
-
freeze_master_meta
(data_key, filter_key=None)¶ Save
.meta
in.master_meta
for a defined data_key.Parameters: - data_key (str) – Using:
self[data_key]
- filter_key (str, default None) – Currently not implemented!
Using:
self[data_key][filter_key]
- data_key (str) – Using:
-
static
from_sav
(data_key, filename, name=None, path=None, ioLocale='en_US.UTF-8', ioUtf8=True)¶ Creates a new stack instance from a .sav file.
Parameters: - data_key (str) – The data_key for the data and meta in the sav file.
- filename (str) – The name to the sav file.
- name (str) – A name for the sav (stored in the meta).
- path (str) – The path to the sav file.
- ioLocale (str) – The locale used in during the sav processing.
- ioUtf8 (bool) – Boolean that indicates the mode in which text communicated to or from the I/O module will be.
Returns: stack – A stack instance that has a data_key with data and metadata to run aggregations.
Return type: stack object instance
-
static
load
(path_stack, compression='gzip', load_cache=False)¶ Load Stack instance from .stack file.
Parameters: - path_stack (str) – The full path to the .stack file that should be created, including the extension.
- compression ({'gzip'}, default 'gzip') – The compression type that has been used saving the file.
- load_cache (bool, default False) – Loads MatrixCache into the Stack a .cache file is found.
Returns: Return type: None
-
static
recode_from_net_def
(dataset, on_vars, net_map, expand, recode='auto', text_prefix='Net:', mis_in_rec=False, verbose=True)¶ Create variables from net definitions.
-
reduce
(data_keys=None, filters=None, x=None, y=None, variables=None, views=None)¶ Remove keys from the matching levels, erasing discrete Stack portions.
Parameters: filters, x, y, views (data_keys,) – Returns: Return type: None
-
refresh
(data_key, new_data_key='', new_weight=None, new_data=None, new_meta=None)¶ Re-run all or a portion of Stack’s aggregations for a given data key.
refresh() can be used to re-weight the data using a new case data weight variable or to re-run all aggregations based on a changed source data version (e.g. after cleaning the file/ dropping cases) or a combination of the both.
Note
Currently this is only supported for the preset QuantipyViews(), namely:
'cbase'
,'rbase'
,'counts'
,'c%'
,'r%'
,'mean'
,'ebase'
.Parameters: - data_key (str) – The Links’ data key to be modified.
- new_data_key (str, default '') – Controls if the existing data key’s files and aggregations will be overwritten or stored via a new data key.
- new_weight (str) – The name of a new weight variable used to re-aggregate the Links.
- new_data (pandas.DataFrame) – The case data source. If None is given, the original case data found for the data key will be used.
- new_meta (quantipy meta document) – A meta data source associated with the case data. If None is given, the original meta definition found for the data key will be used.
Returns: Return type: None
-
remove_data
(data_keys)¶ Deletes the data_key(s) and associated data specified in the Stack.
Parameters: data_keys (str or list of str) – The data keys to remove. Returns: Return type: None
-
restore_meta
(data_key, filter_key=None)¶ Restore the
.master_meta
for a defined data_key if it exists.Undo self.apply_meta_edits()
Parameters: - data_key (str) – Accessing this metadata:
self[data_key].meta
- filter_key (str, default None) – Currently not implemented!
Accessing this metadata:
self[data_key][filter_key].meta
- data_key (str) – Accessing this metadata:
-
save
(path_stack, compression='gzip', store_cache=True, decode_str=False, dataset=False, describe=False)¶ Save Stack instance to .stack file.
Parameters: - path_stack (str) – The full path to the .stack file that should be created, including the extension.
- compression ({'gzip'}, default 'gzip') – The intended compression type.
- store_cache (bool, default True) – Stores the MatrixCache in a file in the same location.
- decode_str (bool, default=True) – If True the unicoder function will be used to decode all str objects found anywhere in the meta document/s.
- dataset (bool, default=False) – If True a json/csv will be saved parallel to the saved stack for each data key in the stack.
- describe (bool, default=False) – If True the result of stack.describe().to_excel() will be saved parallel to the saved stack.
Returns: Return type: None
-
variable_types
(data_key, only_type=None, verbose=True)¶ Group variables by data types found in the meta.
Parameters: - data_key (str) – The reference name of a case data source hold by the Stack instance.
- only_type ({'int', 'float', 'single', 'delimited set', 'string',) – ‘date’, time’, ‘array’}, optional Will restrict the output to the given data type.
Returns: types – A summary of variable names mapped to their data types, in form of {type_name: [variable names]} or a list of variable names confirming only_type.
Return type: dict or list of str
-