DataSet

class quantipy.DataSet(name, dimensions_comp=True)

A set of casedata (required) and meta data (optional).

DESC.

add_filter_var(name, logic, overwrite=False)

Create filter-var, that allows index slicing using manifest_filter

Parameters:
  • name (str) – Name and label of the new filter-variable, which gets also listed in DataSet.filters
  • logic (complex logic/ str, list of complex logic/ str) – Logic to keep cases. Complex logic should be provided in form of: ` { 'label': 'any text', 'logic': {var: keys} / intersection/ .... } ` If a str (column-name) is provided, automatically a logic is created that keeps all cases which are not empty for this column. If logic is a list, each included list-item becomes a category of the new filter-variable and all cases are kept that satify all conditions (intersection)
  • overwrite (bool, default False) – Overwrite an already existing filter-variable.
add_meta(name, qtype, label, categories=None, items=None, text_key=None, replace=True)

Create and insert a well-formed meta object into the existing meta document.

Parameters:
  • name (str) – The column variable name keyed in meta['columns'].
  • qtype ({'int', 'float', 'single', 'delimited set', 'date', 'string'}) – The structural type of the data the meta describes.
  • label (str) – The text label information.
  • categories (list of str, int, or tuples in form of (int, str), default None) – When a list of str is given, the categorical values will simply be enumerated and mapped to the category labels. If only int are provided, text labels are assumed to be an empty str (‘’) and a warning is triggered. Alternatively, codes can be mapped to categorical labels, e.g.: [(1, 'Elephant'), (2, 'Mouse'), (999, 'No animal')]
  • items (list of str, int, or tuples in form of (int, str), default None) – If provided will automatically create an array type mask. When a list of str is given, the item number will simply be enumerated and mapped to the category labels. If only int are provided, item text labels are assumed to be an empty str (‘’) and a warning is triggered. Alternatively, numerical values can be mapped explicitly to items labels, e.g.: [(1 'The first item'), (2, 'The second item'), (99, 'Last item')]
  • text_key (str, default None) – Text key for text-based label information. Uses the DataSet.text_key information if not provided.
  • replace (bool, default True) – If True, an already existing corresponding pd.DataFrame column in the case data component will be overwritten with a new (empty) one.
Returns:

DataSet is modified inplace, meta data and _data columns will be added

Return type:

None

align_order(vlist, align_against=None, integrate_rc=(['_rc', '_rb'], True), fix=[])

Align list to existing order.

Parameters:
  • vlist (list of str) – The list which should be reordered.
  • align_against (str or list of str, default None) – The list of variables to align against. If a string is provided, the depending set list is taken. If None, “data file” set is taken.
  • integrate_rc (tuple (list, bool)) – The provided list are the suffixes for recodes, the bool decides whether parent variables should be replaced by their recodes if the parent variable is not in vlist.
  • fix (list of str) – Variables which are fixed at the beginning of the reordered list.
all(name, codes)

Return a logical has_all() slicer for the passed codes.

Note

When applied to an array mask, the has_all() logic is ex- tended to the item sources, i.e. the it must itself be true for all the items.

Parameters:
  • name (str, default None) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • codes (int or list of int) – The codes to build the logical slicer from.
Returns:

slicer – The indices fulfilling has_all([codes]).

Return type:

pandas.Index

any(name, codes)

Return a logical has_any() slicer for the passed codes.

Note

When applied to an array mask, the has_any() logic is ex- tended to the item sources, i.e. the it must itself be true for at least one of the items.

Parameters:
  • name (str, default None) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • codes (int or list of int) – The codes to build the logical slicer from.
Returns:

slicer – The indices fulfilling has_any([codes]).

Return type:

pandas.Index

band(name, bands, new_name=None, label=None, text_key=None)

Group numeric data with band definitions treated as group text labels.

Wrapper around derive() for quick banding of numeric data.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] that will be banded into summarized categories.
  • bands (list of int/tuple or dict mapping the former to value texts) – The categorical bands to be used. Bands can be single numeric values or ranges, e.g.: [0, (1, 10), 11, 12, (13, 20)]. Be default, each band will also make up the value text of the category created in the _meta component. To specify custom texts, map each band to a category name e.g.: [{‘A’: 0}, {‘B’: (1, 10)}, {‘C’: 11}, {‘D’: 12}, {‘E’: (13, 20)}]
  • new_name (str, default None) – The created variable will be named '<name>_banded', unless a desired name is provided explicitly here.
  • label (str, default None) – The created variable’s text label will be identical to the origi- nating one’s passed in name, unless a desired label is provided explicitly here.
  • text_key (str, default None) – Text key for text-based label information. Uses the DataSet.text_key information if not provided.
Returns:

DataSet is modified inplace.

Return type:

None

by_type(types=None)

Get an overview of all the variables ordered by their type.

Parameters:types (str or list of str, default None) – Restrict the overview to these data types.
Returns:overview – The variables per data type inside the DataSet.
Return type:pandas.DataFrame
categorize(name, categorized_name=None)

Categorize an int/string/text variable to single.

The values object of the categorized variable is populated with the unique values found in the originating variable (ignoring np.NaN / empty row entries).

Parameters:
  • name (str) – The column variable name keyed in meta['columns'] that will be categorized.
  • categorized_name (str) – If provided, the categorized variable’s new name will be drawn from here, otherwise a default name in form of 'name#' will be used.
Returns:

DataSet is modified inplace, adding the categorized variable to it.

Return type:

None

clear_factors(name)

Remove all factors set in the variable’s 'values' object.

Parameters:name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
Returns:
Return type:None
clone()

Get a deep copy of the DataSet instance.

code_count(name, count_only=None, count_not=None)

Get the total number of codes/entries found per row.

Note

Will be 0/1 for type single and range between 0 and the number of possible values for type delimited set.

Parameters:
  • name (str) – The column variable name keyed in meta['columns'] or meta['masks'].
  • count_only (int or list of int, default None) – Pass a list of codes to restrict counting to.
  • count_not (int or list of int, default None) – Pass a list of codes that should no be counted.
Returns:

count – A series with the results as ints.

Return type:

pandas.Series

code_from_label(name, text_label, text_key=None, exact=True, flat=True)

Return the code belonging to the passed text label (if present).

Parameters:
  • name (str) – The originating variable name keyed in meta['columns'] or meta['masks'].
  • text_label (str or list of str) – The value text(s) to search for.
  • text_key (str, default None) – The desired text_key to search through. Uses the DataSet.text_key information if not provided.
  • exact (bool, default True) – text_label must exactly match a categorical value’s text. If False, it is enough that the category contains the text_label.
  • flat (If a list is passed for text_label, return all found codes) – as a regular list. If False, return a list of lists matching the order of the text_label list.
Returns:

codes – The list of value codes found for the passed label text.

Return type:

list

codes(name)

Get categorical data’s numerical code values.

Parameters:name (str) – The column variable name keyed in _meta['columns'].
Returns:codes – The list of category codes.
Return type:list
codes_in_data(name)

Get a list of codes that exist in data.

compare(dataset, variables=None, strict=False, text_key=None)

Compares types, codes, values, question labels of two datasets.

Parameters:
  • dataset (quantipy.DataSet instance) – Test if all variables in the provided dataset are also in self and compare their metadata definitions.
  • variables (str, list of str) – Check only these variables
  • strict (bool, default False) – If True lower/ upper cases and spaces are taken into account.
  • text_key (str, list of str) – The textkeys for which texts are compared.
Returns:

Return type:

None

compare_filter(name1, name2)

Show if filters result in the same index.

Parameters:
  • name1 (str) – Name of the first filter variable
  • name2 (str/ list of st) – Name(s) of the filter variable(s) to compare with.
convert(name, to)

Convert meta and case data between compatible variable types.

Wrapper around the separate as_TYPE() conversion methods.

Parameters:
  • name (str) – The column variable name keyed in meta['columns'] that will be converted.
  • to ({'int', 'float', 'single', 'delimited set', 'string'}) – The variable type to convert to.
Returns:

The DataSet variable is modified inplace.

Return type:

None

copy(name, suffix='rec', copy_data=True, slicer=None, copy_only=None, copy_not=None)

Copy meta and case data of the variable defintion given per name.

Parameters:
  • name (str) – The originating column variable name keyed in meta['columns'] or meta['masks'].
  • suffix (str, default 'rec') – The new variable name will be constructed by suffixing the original name with _suffix, e.g. 'age_rec.
  • copy_data (bool, default True) – The new variable assumes the data of the original variable.
  • slicer (dict) – If the data is copied it is possible to filter the data with a complex logic. Example: slicer = {‘q1’: not_any([99])}
  • copy_only (int or list of int, default None) – If provided, the copied version of the variable will only contain (data and) meta for the specified codes.
  • copy_not (int or list of int, default None) – If provided, the copied version of the variable will contain (data and) meta for the all codes, except of the indicated.
Returns:

DataSet is modified inplace, adding a copy to both the data and meta component.

Return type:

None

copy_array_data(source, target, source_items=None, target_items=None, slicer=None)
create_set(setname='new_set', based_on='data file', included=None, excluded=None, strings='keep', arrays='masks', replace=None, overwrite=False)

Create a new set in dataset._meta['sets'].

Parameters:
  • setname (str, default 'new_set') – Name of the new set.
  • based_on (str, default 'data file') – Name of set that can be reduced or expanded.
  • included (str or list/set/tuple of str) – Names of the variables to be included in the new set. If None all variables in based_on are taken.
  • excluded (str or list/set/tuple of str) – Names of the variables to be excluded in the new set.
  • strings ({'keep', 'drop', 'only'}, default 'keep') – Keep, drop or only include string variables.
  • arrays ({'masks', 'columns'}, default masks) – For arrays add masks@varname or columns@varname.
  • replace (dict) – Replace a variable in the set with an other. Example: {‘q1’: ‘q1_rec’}, ‘q1’ and ‘q1_rec’ must be included in based_on. ‘q1’ will be removed and ‘q1_rec’ will be moved to this position.
  • overwrite (bool, default False) – Overwrite if meta['sets'][name] already exist.
Returns:

The DataSet is modified inplace.

Return type:

None

crosstab(x, y=None, w=None, pct=False, decimals=1, text=True, rules=False, xtotal=False, f=None)
cut_item_texts(arrays=None)

Remove array text from array item texts.

Parameters:arrays (str, list of str, default None) – Cut texts for items of these arrays. If None, all keys in ._meta['masks'] are taken.
data()

Return the data component of the DataSet instance.

derive(name, qtype, label, cond_map, text_key=None)

Create meta and recode case data by specifying derived category logics.

Parameters:
  • name (str) – The column variable name keyed in meta['columns'].
  • qtype ([int, float, single, delimited set]) – The structural type of the data the meta describes.
  • label (str) – The text label information.
  • cond_map (list of tuples) –

    Tuples of either two or three elements of following structures:

    2 elements, no labels provided: (code, <qp logic expression here>), e.g.: (1, intersection([{'gender': [1]}, {'age': frange('30-40')}]))

    2 elements, no codes provided: (‘text label’, <qp logic expression here>), e.g.: ('Cat 1', intersection([{'gender': [1]}, {'age': frange('30-40')}]))

    3 elements, with codes + labels: (code, ‘Label goes here’, <qp logic expression here>), e.g.: (1, 'Men, 30 to 40', intersection([{'gender': [1]}, {'age': frange('30-40')}]))

  • text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
Returns:

DataSet is modified inplace.

Return type:

None

derotate(levels, mapper, other=None, unique_key='identity', dropna=True)

Derotate data and meta using the given mapper, and appending others.

This function derotates data using the specification defined in mapper, which is a list of dicts of lists, describing how columns from data can be read as a heirarchical structure.

Returns derotated DataSet instance and saves data and meta as json and csv.

Parameters:
  • levels (dict) – The name and values of a new column variable to identify cases.
  • mapper (list of dicts of lists) –

    A list of dicts matching where the new column names are keys to to lists of source columns. Example:

    >>> mapper = [{'q14_1': ['q14_1_1', 'q14_1_2', 'q14_1_3']},
    ...           {'q14_2': ['q14_2_1', 'q14_2_2', 'q14_2_3']},
    ...           {'q14_3': ['q14_3_1', 'q14_3_2', 'q14_3_3']}]
    
  • unique_key (str) – Name of column variable that will be copied to new dataset.
  • other (list (optional; default=None)) – A list of additional columns from the source data to be appended to the end of the resulting stacked dataframe.
  • dropna (boolean (optional; default=True)) – Passed through to the pandas.DataFrame.stack() operation.
Returns:

Return type:

new qp.DataSet instance

describe(var=None, only_type=None, text_key=None, axis_edit=None)

Inspect the DataSet’s global or variable level structure.

dichotomize(name, value_texts=None, keep_variable_text=True, ignore=None, replace=False, text_key=None)
dimensionize(names=None)

Rename the dataset columns for Dimensions compatibility.

dimensionizing_mapper(names=None)

Return a renaming dataset mapper for dimensionizing names.

Parameters:None
Returns:mapper – A renaming mapper in the form of a dict of {old: new} that maps non-Dimensions naming conventions to Dimensions naming conventions.
Return type:dict
drop(name, ignore_items=False)

Drops variables from meta and data components of the DataSet.

Parameters:
  • name (str or list of str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • ignore_items (bool) – If False source variables for arrays in _meta['columns'] are dropped, otherwise kept.
Returns:

DataSet is modified inplace.

Return type:

None

drop_duplicates(unique_id='identity', keep='first', sort_by=None)

Drop duplicated cases from self._data.

Parameters:
  • unique_id (str) – Variable name that gets scanned for duplicates.
  • keep (str, {'first', 'last'}) – Keep first or last of the duplicates.
  • sort_by (str) – Name of a variable to sort the data by, for example “endtime”. It is a helper to specify keep.
duplicates(name='identity')

Returns a list with duplicated values for the provided name.

Parameters:name (str, default 'identity') – The column variable name keyed in meta['columns'].
Returns:vals – A list of duplicated values found in the named variable.
Return type:list
empty(name, condition=None)

Check variables for emptiness (opt. restricted by a condition).

Parameters:
  • name ((list of) str) – The mask variable name keyed in _meta['columns'].
  • condition (Quantipy logic expression, default None) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be considered.
Returns:

empty

Return type:

bool

empty_items(name, condition=None, by_name=True)

Test arrays for item emptiness (opt. restricted by a condition).

Parameters:
  • name ((list of) str) – The mask variable name keyed in _meta['masks'].
  • condition (Quantipy logic expression, default None) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be considered.
  • by_name (bool, default True) – Return array items by their name or their index.
Returns:

empty – The list of empty items by their source names or positional index (starting from 1!, mapped to their parent mask name if more than one).

Return type:

list

extend_filter_var(name, logic, extend_as=None)

Extend logic of an existing filter-variable.

Parameters:
  • name (str) – Name of the existing filter variable.
  • logic ((list of) complex logic/ str) – Additional logic to keep cases (intersection with existing logic). Complex logic should be provided in form of: ` { 'label': 'any text', 'logic': {var: keys} / intersection/ .... } `
  • extend_as (str, default None) – Addition to the filter-name to create a new filter. If it is None the existing filter-variable is overwritten.
extend_items(name, ext_items, text_key=None)

Extend mask items of an existing array.

Parameters:
  • name (str) – The originating column variable name keyed in meta['masks'].
  • ext_items (list of str/ list of dict) – The label of the new item. It can be provided as str, then the new column is named by the grid and the item_no, or as dict {‘new_column’: ‘label’}.
  • text_key (str/ list of str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
extend_values(name, ext_values, text_key=None, safe=True)

Add to the ‘values’ object of existing column or mask meta data.

Attempting to add already existing value codes or providing already present value texts will both raise a ValueError!

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • ext_values (list of str or tuples in form of (int, str), default None) – When a list of str is given, the categorical values will simply be enumerated and mapped to the category labels. Alternatively codes can mapped to categorical labels, e.g.: [(1, ‘Elephant’), (2, ‘Mouse’), (999, ‘No animal’)]
  • text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
  • safe (bool, default True) – If set to False, duplicate value texts are allowed when extending the values object.
Returns:

The DataSet is modified inplace.

Return type:

None

factors(name)

Get categorical data’s stat. factor values.

Parameters:name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
Returns:factors – A {value: factor} mapping.
Return type:OrderedDict
filter(alias, condition, inplace=False)

Filter the DataSet using a Quantipy logical expression.

find(str_tags=None, suffixed=False)

Find variables by searching their names for substrings.

Parameters:
  • str_tags ((list of) str) – The strings tags to look for in the variable names. If not provided, the modules’ default global list of substrings from VAR_SUFFIXES will be used.
  • suffixed (bool, default False) – If set to True, only variable names that end with a given string sequence will qualify.
Returns:

found – The list of matching variable names.

Return type:

list

find_duplicate_texts(name, text_key=None)

Collect values that share the same text information to find duplicates.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
first_responses(name, n=3, others='others', reduce_values=False)

Create n-first mentions from the set of responses of a delimited set.

Parameters:
  • name (str) – The column variable name of a delimited set keyed in meta['columns'].
  • n (int, default 3) – The number of mentions that will be turned into single-type variables, i.e. 1st mention, 2nd mention, 3rd mention, 4th mention, etc.
  • others (None or str, default 'others') – If provided, all remaining values will end up in a new delimited set variable reduced by the responses transferred to the single mention variables.
  • reduce_values (bool, default False) – If True, each new variable will only list the categorical value metadata for the codes found in the respective data vector, i.e. not the initial full codeframe.
Returns:

DataSet is modified inplace.

Return type:

None

flatten(name, codes, new_name=None, text_key=None)

Create a variable that groups array mask item answers to categories.

Parameters:
  • name (str) – The array variable name keyed in meta['masks'] that will be converted.
  • codes (int, list of int) – The answers codes that determine the categorical grouping. Item labels will become the category labels.
  • new_name (str, default None) – The name of the new delimited set variable. If None, name is suffixed with ‘_rec’.
  • text_key (str, default None) – Text key for text-based label information. Uses the DataSet.text_key information if not provided.
Returns:

The DataSet is modified inplace, delimited set variable is added.

Return type:

None

force_texts(copy_to=None, copy_from=None, update_existing=False)

Copy info from existing text_key to a new one or update the existing one.

Parameters:
  • copy_to (str) – {‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} None -> _meta[‘lib’][‘default text’] The text key that will be filled.
  • copy_from (str / list) – {‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} You can also enter a list with text_keys, if the first text_key doesn’t exist, it takes the next one
  • update_existing (bool) – True : copy_to will be filled in any case False: copy_to will be filled if it’s empty/not existing
Returns:

Return type:

None

from_batch(batch_name, include='identity', text_key=[], apply_edits=True, additions='variables')

Get a filtered subset of the DataSet using qp.Batch definitions.

Parameters:
  • batch_name (str) – Name of a Batch included in the DataSet.
  • include (str/ list of str) – Name of variables that get included even if they are not in Batch.
  • text_key (str/ list of str, default None) – Take over all texts of the included text_key(s), if None is provided all included text_keys are taken.
  • apply_edits (bool, default True) – meta_edits and rules are used as/ applied on global meta of the new DataSet instance.
  • additions ({'variables', 'filters', 'full', None}) – Extend included variables by the xks, yks and weights of the additional batches if set to ‘variables’, ‘filters’ will create new 1/0-coded variables that reflect any filters defined. Selecting ‘full’ will do both, None will ignore additional Batches completely.
Returns:

b_ds

Return type:

quantipy.DataSet

from_components(data_df, meta_dict=None, reset=True, text_key=None)

Attach data and meta directly to the DataSet instance.

Note

Except testing for appropriate object types, this method offers no additional safeguards or consistency/compability checks with regard to the passed data and meta documents!

Parameters:
  • data_df (pandas.DataFrame) – A DataFrame that contains case data entries for the DataSet.
  • meta_dict (dict, default None) – A dict that stores meta data describing the columns of the data_df. It is assumed to be well-formed following the Quantipy meta data structure.
  • reset (bool, default True) – Clean the ‘lib’ and 'sets' metadata collections from non-native entries, e.g. user-defined information or helper metadata.
  • text_key (str, default None) – The text_key to be used. If not provided, it will be attempted to use the ‘default text’ from the meta['lib'] definition.
Returns:

Return type:

None

from_excel(path_xlsx, merge=True, unique_key='identity')

Converts excel files to a dataset or/and merges variables.

Parameters:
  • path_xlsx (str) – Path where the excel file is stored. The file must have exactly one sheet with data.
  • merge (bool) – If True the new data from the excel file will be merged on the dataset.
  • unique_key (str) – If merge=True an hmerge is done on this variable.
Returns:

new_dataset – Contains only the data from excel. If merge=True dataset is modified inplace.

Return type:

quantipy.DataSet

from_stack(stack, data_key=None, dk_filter=None, reset=True)

Use quantipy.Stack data and meta to create a DataSet instance.

Parameters:
  • stack (quantipy.Stack) – The Stack instance to convert.
  • data_key (str) – The reference name where meta and data information are stored.
  • dk_filter (string, default None) – Filter name if the stack contains more than one filters. If None ‘no_filter’ will be used.
  • reset (bool, default True) – Clean the ‘lib’ and 'sets' metadata collections from non-native entries, e.g. user-defined information or helper metadata.
Returns:

Return type:

None

fully_hidden_arrays()

Get all array definitions that contain only hidden items.

Returns:hidden – The list of array mask names.
Return type:list
get_batch(name)

Get existing Batch instance from DataSet meta information.

Parameters:name (str) – Name of existing Batch instance.
get_property(name, prop_name, text_key=None)
hide_empty_items(condition=None, arrays=None)

Apply rules meta to automatically hide empty array items.

Parameters:
  • name ((list of) str, default None) – The array mask variable names keyed in _meta['masks']. If not explicitly provided will test all array mask definitions.
  • condition (Quantipy logic expression) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be considered.
Returns:

Return type:

None

hiding(name, hide, axis='y', hide_values=True)

Set or update rules[axis]['dropx'] meta for the named column.

Quantipy builds will respect the hidden codes and cut them from results.

Note

This is not equivalent to DataSet.set_missings() as missing values are respected also in computations.

Parameters:
  • name (str or list of str) – The column variable(s) name keyed in _meta['columns'].
  • hide (int or list of int) – Values indicated by their int codes will be dropped from Quantipy.View.dataframes.
  • axis ({'x', 'y'}, default 'y') – The axis to drop the values from.
  • hide_values (bool, default True) – Only considered if name refers to a mask. If True, values are hidden on all mask items. If False, mask items are hidden by position (only for array summaries).
Returns:

Return type:

None

hmerge(dataset, on=None, left_on=None, right_on=None, overwrite_text=False, from_set=None, inplace=True, update_existing=None, merge_existing=None, text_properties=None, verbose=True)

Merge Quantipy datasets together using an index-wise identifer.

This function merges two Quantipy datasets together, updating variables that exist in the left dataset and appending others. New variables will be appended in the order indicated by the ‘data file’ set if found, otherwise they will be appended in alphanumeric order. This merge happend horizontally (column-wise). Packed kwargs will be passed on to the pandas.DataFrame.merge() method call, but that merge will always happen using how=’left’.

Parameters:
  • dataset (quantipy.DataSet) – The dataset to merge into the current DataSet.
  • on (str, default=None) – The column to use as a join key for both datasets.
  • left_on (str, default=None) – The column to use as a join key for the left dataset.
  • right_on (str, default=None) – The column to use as a join key for the right dataset.
  • overwrite_text (bool, default=False) – If True, text_keys in the left meta that also exist in right meta will be overwritten instead of ignored.
  • from_set (str, default=None) – Use a set defined in the right meta to control which columns are merged from the right dataset.
  • inplace (bool, default True) – If True, the DataSet will be modified inplace with new/updated columns. Will return a new DataSet instance if False.
  • update_existing (str/ list of str, default None, {'all', [var_names]}) – Update values for defined delimited sets if it exists in both datasets.
  • text_properties (str/ list of str, default=None, {'all', [var_names]}) – Controls the update of the dataset_left properties with properties from the dataset_right. If None, properties from dataset_left will be updated by the ones from the dataset_right. If ‘all’, properties from dataset_left will be kept unchanged. Otherwise, specify the list of properties which will be kept unchanged in the dataset_left; all others will be updated by the properties from dataset_right.
  • verbose (bool, default=True) – Echo progress feedback to the output pane.
Returns:

None or new_dataset – If the merge is not applied inplace, a DataSet instance is returned.

Return type:

quantipy.DataSet

interlock(name, label, variables, val_text_sep='/')

Build a new category-intersected variable from >=2 incoming variables.

Parameters:
  • name (str) – The new column variable name keyed in _meta['columns'].
  • label (str) – The new text label for the created variable.
  • variables (list of >= 2 str or dict (mapper)) –

    The column names of the variables that are feeding into the intersecting recode operation. Or dicts/mapper to create temporary variables for interlock. Can also be a mix of str and dict. Example:

    >>> ['gender',
    ...  {'agegrp': [(1, '18-34', {'age': frange('18-34')}),
    ...              (2, '35-54', {'age': frange('35-54')}),
    ...              (3, '55+', {'age': is_ge(55)})]},
    ...  'region']
    
  • val_text_sep (str, default '/') – The passed character (or any other str value) wil be used to separate the incoming individual value texts to make up the inter- sected category value texts, e.g.: ‘Female/18-30/London’.
Returns:

Return type:

None

is_like_numeric(name)

Test if a string-typed variable can be expressed numerically.

Parameters:name (str) – The column variable name keyed in _meta['columns'].
Returns:
Return type:bool
is_nan(name)

Detect empty entries in the _data rows.

Parameters:name (str) – The column variable name keyed in meta['columns'].
Returns:count – A series with the results as bool.
Return type:pandas.Series
is_subfilter(name1, name2)

Verify if index of name2 is part of the index of name1.

item_no(name)

Return the order/position number of passed array item variable name.

Parameters:name (str) – The column variable name keyed in _meta['columns'].
Returns:no – The positional index of the item (starting from 1).
Return type:int
item_texts(name, text_key=None, axis_edit=None)

Get the text meta data for the items of the passed array mask name.

Parameters:
  • name (str) – The mask variable name keyed in _meta['masks'].
  • text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
  • axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns:

texts – The list of item texts for the array elements.

Return type:

list

items(name, text_key=None, axis_edit=None)

Get the array’s paired item names and texts information from the meta.

Parameters:
  • name (str) – The column variable name keyed in _meta['masks'].
  • text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
  • axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns:

items – The list of source item names (from _meta['columns']) and their text information packed as tuples.

Return type:

list of tuples

Create a Link instance from the DataSet.

manifest_filter(name)

Get index slicer from filter-variables.

Parameters:name (str) – Name of the filter_variable.
merge_texts(dataset)

Add additional text versions from other text_key meta.

Case data will be ignored during the merging process.

Parameters:dataset ((A list of multiple) quantipy.DataSet) – One or multiple datasets that provide new text_key meta.
Returns:
Return type:None
meta(name=None, text_key=None, axis_edit=None)

Provide a pretty summary for variable meta given as per name.

Parameters:
  • name (str, default None) – The variable name keyed in _meta['columns'] or _meta['masks']. If None, the entire meta component of the DataSet instance will be returned.
  • text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
  • axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns:

meta – Either a DataFrame that sums up the meta information on a mask or column or the meta dict as a whole is

Return type:

dict or pandas.DataFrame

meta_to_json(key=None, collection=None)

Save a meta object as json file.

Parameters:
  • key (str, default None) – Name of the variable whose metadata is saved, if key is not provided included collection or the whole meta is saved.
  • collection (str {'columns', 'masks', 'sets', 'lib'}, default None) – The meta object is taken from this collection.
Returns:

Return type:

None

min_value_count(name, min=50, weight=None, condition=None, axis='y', verbose=True)

Wrapper for self.hiding(), which is hiding low value_counts.

Parameters:
  • variables (str/ list of str) – Name(s) of the variable(s) whose values are checked against the defined border.
  • min (int) – If the amount of counts for a value is below this number, the value is hidden.
  • weight (str, default None) – Name of the weight, which is used to calculate the weigthed counts.
  • condition (complex logic) – The data, which is used to calculate the counts, can be filtered by the included condition.
  • axis ({'y', 'x', ['x', 'y']}, default None) – The axis on which the values are hidden.
names(ignore_items=True)

Find all weak-duplicate variable names that are different only by case.

Note

Will return self.variables() if no weak-duplicates are found.

Returns:weak_dupes – An overview of case-sensitive spelling differences in otherwise equal variable names.
Return type:pd.DataFrame
order(new_order=None, reposition=None, regroup=False)

Set the global order of the DataSet variables collection.

The global order of the DataSet is reflected in the data component’s pd.DataFrame.columns order and the variable references in the meta component’s ‘data file’ items.

Parameters:
  • new_order (list) – A list of all DataSet variables in the desired order.
  • reposition ((List of) dict) – Each dict maps one or a list of variables to a reference variable name key. The mapped variables are moved before the reference key.
  • regroup (bool, default False) – Attempt to regroup non-native variables (i.e. created either manually with add_meta(), recode(), derive(), etc. or automatically by manifesting qp.View objects) with their originating variables.
Returns:

Return type:

None

parents(name)

Get the parent meta information for masks-structured column elements.

Parameters:name (str) – The mask variable name keyed in _meta['columns'].
Returns:parents – The list of parents the _meta['columns'] variable is attached to.
Return type:list
populate(batches='all', verbose=True)

Create a qp.Stack based on all available qp.Batch definitions.

Parameters:batches (str/ list of str) – Name(s) of qp.Batch instances that are used to populate the qp.Stack.
Returns:
Return type:qp.Stack
read_ascribe(path_meta, path_data, text_key)

Load Dimensions .xml/.txt files, connecting as data and meta components.

Parameters:
  • path_meta (str) – The full path (optionally with extension '.xml', otherwise assumed as such) to the meta data defining '.xml' file.
  • path_data (str) – The full path (optionally with extension '.txt', otherwise assumed as such) to the case data defining '.txt' file.
Returns:

The DataSet is modified inplace, connected to Quantipy data and meta components that have been converted from their Ascribe source files.

Return type:

None

read_dimensions(path_meta, path_data)

Load Dimensions .ddf/.mdd files, connecting as data and meta components.

Parameters:
  • path_meta (str) – The full path (optionally with extension '.mdd', otherwise assumed as such) to the meta data defining '.mdd' file.
  • path_data (str) – The full path (optionally with extension '.ddf', otherwise assumed as such) to the case data defining '.ddf' file.
Returns:

The DataSet is modified inplace, connected to Quantipy data and meta components that have been converted from their Dimensions source files.

Return type:

None

read_quantipy(path_meta, path_data, reset=True)

Load Quantipy .csv/.json files, connecting as data and meta components.

Parameters:
  • path_meta (str) – The full path (optionally with extension '.json', otherwise assumed as such) to the meta data defining '.json' file.
  • path_data (str) – The full path (optionally with extension '.csv', otherwise assumed as such) to the case data defining '.csv' file.
  • reset (bool, default True) – Clean the ‘lib’ and 'sets' metadata collections from non-native entries, e.g. user-defined information or helper metadata.
Returns:

The DataSet is modified inplace, connected to Quantipy native data and meta components.

Return type:

None

read_spss(path_sav, **kwargs)

Load SPSS Statistics .sav files, converting and connecting data/meta.

Parameters:path_sav (str) – The full path (optionally with extension '.sav', otherwise assumed as such) to the '.sav' file.
Returns:The DataSet is modified inplace, connected to Quantipy data and meta components that have been converted from the SPSS source file.
Return type:None
recode(target, mapper, default=None, append=False, intersect=None, initialize=None, fillna=None, inplace=True)

Create a new or copied series from data, recoded using a mapper.

This function takes a mapper of {key: logic} entries and injects the key into the target column where its paired logic is True. The logic may be arbitrarily complex and may refer to any other variable or variables in data. Where a pre-existing column has been used to start the recode, the injected values can replace or be appended to any data found there to begin with. Note that this function does not edit the target column, it returns a recoded copy of the target column. The recoded data will always comply with the column type indicated for the target column according to the meta.

Parameters:
  • target (str) – The column variable name keyed in _meta['columns'] that is the target of the recode. If not found in _meta this will fail with an error. If target is not found in data.columns the recode will start from an empty series with the same index as _data. If target is found in data.columns the recode will start from a copy of that column.
  • mapper (dict) – A mapper of {key: logic} entries.
  • default (str, default None) – The column name to default to in cases where unattended lists are given in your logic, where an auto-transformation of {key: list} to {key: {default: list}} is provided. Note that lists in logical statements are themselves a form of shorthand and this will ultimately be interpreted as: {key: {default: has_any(list)}}.
  • append (bool, default False) – Should the new recoded data be appended to values already found in the series? If False, data from series (where found) will overwrite whatever was found for that item instead.
  • intersect (logical statement, default None) – If a logical statement is given here then it will be used as an implied intersection of all logical conditions given in the mapper.
  • initialize (str or np.NaN, default None) – If not None, a copy of the data named column will be used to populate the target column before the recode is performed. Alternatively, initialize can be used to populate the target column with np.NaNs (overwriting whatever may be there) prior to the recode.
  • fillna (int, default=None) – If not None, the value passed to fillna will be used on the recoded series as per pandas.Series.fillna().
  • inplace (bool, default True) – If True, the DataSet will be modified inplace with new/updated columns. Will return a new recoded pandas.Series instance if False.
Returns:

Either the DataSet._data is modified inplace or a new pandas.Series is returned.

Return type:

None or recode_series

reduce_filter_var(name, values)

Remove values from filter-variables and recalculate the filter.

remove_html()

Cycle through all meta text objects removing html tags.

Currently uses the regular expression ‘<.*?>’ in _remove_html() classmethod.

Returns:
Return type:None
remove_items(name, remove)

Erase array mask items safely from both meta and case data components.

Parameters:
  • name (str) – The originating column variable name keyed in meta['masks'].
  • remove (int or list of int) – The items listed by their order number in the _meta['masks'][name]['items'] object will be droped from the mask definition.
Returns:

DataSet is modified inplace.

Return type:

None

remove_values(name, remove)

Erase value codes safely from both meta and case data components.

Attempting to remove all value codes from the variable’s value object will raise a ValueError!

Parameters:
  • name (str) – The originating column variable name keyed in meta['columns'] or meta['masks'].
  • remove (int or list of int) – The codes to be removed from the DataSet variable.
Returns:

DataSet is modified inplace.

Return type:

None

rename(name, new_name)

Change meta and data column name references of the variable defintion.

Parameters:
  • name (str) – The originating column variable name keyed in meta['columns'] or meta['masks'].
  • new_name (str) – The new variable name.
Returns:

DataSet is modified inplace. The new name reference replaces the original one.

Return type:

None

rename_from_mapper(mapper, keep_original=False, ignore_batch_props=False)

Rename meta objects and data columns using mapper.

Parameters:mapper (dict) – A renaming mapper in the form of a dict of {old: new} that will be used to rename columns throughout the meta and data.
Returns:DataSet is modified inplace.
Return type:None
reorder_items(name, new_order)

Apply a new order to mask items.

Parameters:
  • name (str) – The variable name keyed in _meta['masks'].
  • new_order (list of int, default None) – The new order of the mask items. The included ints match up to the number of the items (DataSet.item_no('item_name')).
Returns:

DataSet is modified inplace.

Return type:

None

reorder_values(name, new_order=None)

Apply a new order to the value codes defined by the meta data component.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • new_order (list of int, default None) – The new code order of the DataSet variable. If no order is given, the values object is sorted ascending.
Returns:

DataSet is modified inplace.

Return type:

None

repair()

Try to fix legacy meta data inconsistencies and badly shaped array / datafile items 'sets' meta definitions.

repair_text_edits(text_key=None)

Cycle through all meta text objects repairing axis edits.

Parameters:text_key (str / list of str, default None) – {None, ‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} The text_keys for which text edits should be included.
Returns:
Return type:None
replace_texts(replace, text_key=None)

Cycle through all meta text objects replacing unwanted strings.

Parameters:
  • replace (dict, default Nonea) – A dictionary mapping {unwanted string: replacement string}.
  • text_key (str / list of str, default None) – {None, ‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} The text_keys for which unwanted strings are replaced.
Returns:

Return type:

None

resolve_name(name)
restore_item_texts(arrays=None)

Restore array item texts.

Parameters:arrays (str, list of str, default None) – Restore texts for items of these arrays. If None, all keys in ._meta['masks'] are taken.
revert()

Return to a previously saved state of the DataSet.

Note

This method is designed primarily for use in interactive Python environments like iPython/Jupyter and their notebook applications.

roll_up(varlist, ignore_arrays=None)

Replace any array items with their parent mask variable definition name.

Parameters:
  • varlist (list) – A list of meta 'columns' and/or 'masks' names.
  • ignore_arrays ((list of) str) – A list of array mask names that should not be rolled up if their items are found inside varlist.

Note

varlist can also contain nesting var1 > var2. The variables which are included in the nesting can also be controlled by keep and both, even if the variables are also included as a “normal” variable.

Returns:rolled_up – The modified varlist.
Return type:list
save()

Save the current state of the DataSet’s data and meta.

The saved file will be temporarily stored inside the cache. Use this to take a snapshot of the DataSet state to easily revert back to at a later stage.

Note

This method is designed primarily for use in interactive Python environments like iPython/Jupyter notebook applications.

select_text_keys(text_key=None)

Cycle through all meta text objects keep only selected text_key.

Parameters:text_key (str / list of str, default None) – {None, ‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} The text_keys which should be kept.
Returns:
Return type:None
classmethod set_encoding(encoding)

Hack sys.setdefaultencoding() to escape ASCII hell.

Parameters:encoding (str) – The name of the encoding to default to.
set_factors(name, factormap, safe=False)

Apply numerical factors to (single-type categorical) variables.

Factors can be read while aggregating descrp. stat. qp.Views.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • factormap (dict) – A mapping of {value: factor} (int to int).
  • safe (bool, default False) – Set to True to prevent setting factors to the values meta data of non-single type variables.
Returns:

Return type:

None

set_item_texts(name, renamed_items, text_key=None, axis_edit=None)

Rename or add item texts in the items objects of masks.

Parameters:
  • name (str) – The column variable name keyed in _meta['masks'].
  • renamed_items (dict) –

    A dict mapping with following structure (array mask items are assumed to be passed by their order number):

    >>> {1: 'new label for item #1',
    ...  5: 'new label for item #5'}
    
  • text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
  • axis_edit ({'x', 'y', ['x', 'y']}, default None) – If the new_text of the variable should only be considered temp. for build exports, the axes on that the edited text should appear can be provided.
Returns:

The DataSet is modified inplace.

Return type:

None

set_missings(var, missing_map='default', hide_on_y=True, ignore=None)

Flag category definitions for exclusion in aggregations.

Parameters:
  • var (str or list of str) – Variable(s) to apply the meta flags to.
  • missing_map ('default' or list of codes or dict of {'flag': code(s)}, default 'default') – A mapping of codes to flags that can either be ‘exclude’ (globally ignored) or ‘d.exclude’ (only ignored in descriptive statistics). Codes provided in a list are flagged as ‘exclude’. Passing ‘default’ is using a preset list of (TODO: specify) values for exclusion.
  • ignore (str or list of str, default None) – A list of variables that should be ignored when applying missing flags via the ‘default’ list method.
Returns:

Return type:

None

set_property(name, prop_name, prop_value, ignore_items=False)

Access and set the value of a meta object’s properties collection.

Parameters:
  • name (str) – The originating column variable name keyed in meta['columns'] or meta['masks'].
  • prop_name (str) – The property key name.
  • prop_value (any) – The value to be set for the property. Must be of valid type and have allowed values(s) with regard to the property.
  • ignore_items (bool, default False) – When name refers to a variable from the 'masks' collection, setting to True will ignore any items and only apply the property to the mask itself.
Returns:

Return type:

None

set_text_key(text_key)

Set the default text_key of the DataSet.

Note

A lot of the instance methods will fall back to the default text key in _meta['lib']['default text']. It is therefore important to use this method with caution, i.e. ensure that the meta contains text entries for the text_key set.

Parameters:text_key ({'en-GB', 'da-DK', 'fi-FI', 'nb-NO', 'sv-SE', 'de-DE'}) – The text key that will be set in _meta['lib']['default text'].
Returns:
Return type:None
set_value_texts(name, renamed_vals, text_key=None, axis_edit=None)

Rename or add value texts in the ‘values’ object.

This method works for array masks and column meta data.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • renamed_vals (dict) – A dict mapping with following structure: {1: 'new label for code=1', 5: 'new label for code=5'} Codes will be ignored if they do not exist in the ‘values’ object.
  • text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
  • axis_edit ({'x', 'y', ['x', 'y']}, default None) – If renamed_vals should only be considered temp. for build exports, the axes on that the edited text should appear can be provided.
Returns:

The DataSet is modified inplace.

Return type:

None

set_variable_text(name, new_text, text_key=None, axis_edit=None)

Apply a new or update a column’s/masks’ meta text object.

Parameters:
  • name (str) – The originating column variable name keyed in meta['columns'] or meta['masks'].
  • new_text (str) – The text (label) to be set.
  • text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
  • axis_edit ({'x', 'y', ['x', 'y']}, default None) – If the new_text of the variable should only be considered temp. for build exports, the axes on that the edited text should appear can be provided.
Returns:

The DataSet is modified inplace.

Return type:

None

set_verbose_errmsg(verbose=True)
set_verbose_infomsg(verbose=True)
slicing(name, slicer, axis='y')

Set or update rules[axis]['slicex'] meta for the named column.

Quantipy builds will respect the kept codes and show them exclusively in results.

Note

This is not a replacement for DataSet.set_missings() as missing values are respected also in computations.

Parameters:
  • name (str or list of str) – The column variable(s) name keyed in _meta['columns'].
  • slice (int or list of int) – Values indicated by their int codes will be shown in Quantipy.View.dataframes, respecting the provided order.
  • axis ({'x', 'y'}, default 'y') – The axis to slice the values on.
Returns:

Return type:

None

sorting(name, on='@', within=False, between=False, fix=None, ascending=False, sort_by_weight='auto')

Set or update rules['x']['sortx'] meta for the named column.

Parameters:
  • name (str or list of str) – The column variable(s) name keyed in _meta['columns'].
  • within (bool, default True) – Applies only to variables that have been aggregated by creating a an expand grouping / overcode-style View: If True, will sort frequencies inside each group.
  • between (bool, default True) – Applies only to variables that have been aggregated by creating a an expand grouping / overcode-style View: If True, will sort group and regular code frequencies with regard to each other.
  • fix (int or list of int, default None) – Values indicated by their int codes will be ignored in the sorting operation.
  • ascending (bool, default False) – By default frequencies are sorted in descending order. Specify True to sort ascending.
Returns:

Return type:

None

sources(name)

Get the _meta['columns'] elements for the passed array mask name.

Parameters:name (str) – The mask variable name keyed in _meta['masks'].
Returns:sources – The list of source elements from the array definition.
Return type:list
split(save=False)

Return the meta and data components of the DataSet instance.

Parameters:save (bool, default False) – If True, the meta and data objects will be saved to disk, using the instance’s name and path attributes to determine the file location.
Returns:meta, data – The meta dict and the case data DataFrame as separate objects.
Return type:dict, pandas.DataFrame
static start_meta(text_key='main')

Starts a new/empty Quantipy meta document.

Parameters:text_key (str, default None) – The default text key to be set into the new meta document.
Returns:meta – Quantipy meta object
Return type:dict
subset(variables=None, from_set=None, inplace=False)

Create a cloned version of self with a reduced collection of variables.

Parameters:
  • variables (str or list of str, default None) – A list of variable names to include in the new DataSet instance.
  • from_set (str) – The name of an already existing set to base the new DataSet on.
Returns:

subset_ds – The new reduced version of the DataSet.

Return type:

qp.DataSet

take(condition)

Create an index slicer to select rows from the DataFrame component.

Parameters:condition (Quantipy logic expression) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be kept.
Returns:slicer – The indices fulfilling the passed logical condition.
Return type:pandas.Index
text(name, shorten=True, text_key=None, axis_edit=None)

Return the variables text label information.

Parameters:
  • name (str, default None) – The variable name keyed in _meta['columns'] or _meta['masks'].
  • shorten (bool, default True) – If True, text label meta from array items will not report the parent mask’s text. Setting it to False will show the “full” label.
  • text_key (str, default None) – The default text key to be set into the new meta document.
  • axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns:

text – The text metadata.

Return type:

str

to_array(name, variables, label, safe=True)

Combines column variables with same values meta into an array.

Parameters:
  • name (str) – Name of new grid.
  • variables (list of str or list of dicts) – Variable names that become items of the array. New item labels can be added as dict. Example: variables = [‘q1_1’, {‘q1_2’: ‘shop 2’}, {‘q1_3’: ‘shop 3’}]
  • label (str) – Text label for the mask itself.
  • safe (bool, default True) – If True, the method will raise a ValueError if the provided variable name is already present in self. Select False to forcefully overwrite an existing variable with the same name (independent of its type).
Returns:

Return type:

None

to_delimited_set(name, label, variables, from_dichotomous=True, codes_from_name=True)

Combines multiple single variables to new delimited set variable.

Parameters:
  • name (str) – Name of new delimited set
  • label (str) – Label text for the new delimited set.
  • variables (list of str or list of tuples) – variables that get combined into the new delimited set. If they are dichotomous (from_dichotomous=True), the labels of the variables are used as category texts or if tuples are included, the second items will be used for the category texts. If the variables are categorical (from_dichotomous=False) the values of the variables need to be eqaul and are taken for the delimited set.
  • from_dichotomous (bool, default True) – Define if the input variables are dichotomous or categorical.
  • codes_from_name (bool, default True) – If from_dichotomous=True, the codes can be taken from the Variable names, if they are in form of ‘q01_1’, ‘q01_3’, … In this case the codes will be 1, 3, ….
Returns:

Return type:

None

transpose(name, new_name=None, ignore_items=None, ignore_values=None, copy_data=True, text_key=None, overwrite=False)

Create a new array mask with transposed items / values structure.

This method will automatically create meta and case data additions in the DataSet instance.

Parameters:
  • name (str) – The originating mask variable name keyed in meta['masks'].
  • new_name (str, default None) – The name of the new mask. If not provided explicitly, the new_name will be constructed constructed by suffixing the original name with ‘_trans’, e.g. 'Q2Array_trans.
  • ignore_items (int or list of int, default None) – If provided, the items listed by their order number in the _meta['masks'][name]['items'] object will not be part of the transposed array. This means they will be ignored while creating the new value codes meta.
  • ignore_codes (int or list of int, default None) – If provided, the listed code values will not be part of the transposed array. This means they will not be part of the new item meta.
  • text_key (str) – The text key to be used when generating text objects, i.e. item and value labels.
  • overwrite (bool, default False) – Overwrite variable if new_name is already included.
Returns:

DataSet is modified inplace.

Return type:

None

unbind(name)

Remove mask-structure for arrays

uncode(target, mapper, default=None, intersect=None, inplace=True)

Create a new or copied series from data, recoded using a mapper.

Parameters:
  • target (str) – The variable name that is the target of the uncode. If it is keyed in _meta['masks'] the uncode is done for all mask items. If not found in _meta this will fail with an error.
  • mapper (dict) – A mapper of {key: logic} entries.
  • default (str, default None) – The column name to default to in cases where unattended lists are given in your logic, where an auto-transformation of {key: list} to {key: {default: list}} is provided. Note that lists in logical statements are themselves a form of shorthand and this will ultimately be interpreted as: {key: {default: has_any(list)}}.
  • intersect (logical statement, default None) – If a logical statement is given here then it will be used as an implied intersection of all logical conditions given in the mapper.
  • inplace (bool, default True) – If True, the DataSet will be modified inplace with new/updated columns. Will return a new recoded pandas.Series instance if False.
Returns:

Either the DataSet._data is modified inplace or a new pandas.Series is returned.

Return type:

None or uncode_series

undimensionize(names=None, mapper_to_meta=False)

Rename the dataset columns to remove Dimensions compatibility.

undimensionizing_mapper(names=None)

Return a renaming dataset mapper for un-dimensionizing names.

Parameters:None
Returns:mapper – A renaming mapper in the form of a dict of {old: new} that maps Dimensions naming conventions to non-Dimensions naming conventions.
Return type:dict
unify_values(name, code_map, slicer=None, exclusive=False)

Use a mapping of old to new codes to replace code values in _data.

Note

Experimental! Check results carefully!

Parameters:
  • name (str) – The column variable name keyed in meta['columns'].
  • code_map (dict) – A mapping of {old: new}; old and new must be the int-type code values from the column meta data.
  • slicer (Quantipy logic statement, default None) – If provided, the values will only be unified for cases where the condition holds.
  • exclusive (bool, default False) – If True, the recoded unified value will replace whatever is already found in the _data column, ignoring delimited set typed data to which normally would get appended to.
Returns:

Return type:

None

unroll(varlist, keep=None, both=None)

Replace mask with their items, optionally excluding/keeping certain ones.

Parameters:
  • varlist (list) – A list of meta 'columns' and/or 'masks' names.
  • keep (str or list, default None) – The names of masks that will not be replaced with their items.
  • both ('all', str or list of str, default None) – The names of masks that will be included both as themselves and as collections of their items.

Note

varlist can also contain nesting var1 > var2. The variables which are included in the nesting can also be controlled by keep and both, even if the variables are also included as a “normal” variable.

Example::
>>> ds.unroll(varlist = ['q1', 'q1 > gender'], both='all')
['q1',
 'q1_1',
 'q1_2',
 'q1 > gender',
 'q1_1 > gender',
 'q1_2 > gender']
Returns:unrolled – The modified varlist.
Return type:list
update(data, on='identity', text_properties=None)

Update the DataSet with the case data entries found in data.

Parameters:
  • data (pandas.DataFrame) – A dataframe that contains a subset of columns from the DataSet case data component.
  • on (str, default 'identity') – The column to use as a join key.
  • text_properties (str/ list of str, default=None, {'all', [var_names]}) – Controls the update of the dataset_left properties with properties from the dataset_right. If None, properties from dataset_left will be updated by the ones from the dataset_right. If ‘all’, properties from dataset_left will be kept unchanged. Otherwise, specify the list of properties which will be kept unchanged in the dataset_left; all others will be updated by the properties from dataset_right.
Returns:

DataSet is modified inplace.

Return type:

None

used_text_keys()

Get a list of all used textkeys in the dataset instance.

validate(spss_limits=False, verbose=True)

Identify and report inconsistencies in the DataSet instance.

name:
column/mask name and meta[collection][var]['name'] are not identical
q_label:
text object is badly formatted or has empty text mapping
values:
categorical variable does not contain values, value text is badly formatted or has empty text mapping
text_keys:
dataset.text_key is not included or existing text keys are not consistent (also for parents)
source:
parents or items do not exist
codes:
codes in data component are not included in meta component
spss limit name:
length of name is greater than spss limit (64 characters) (only shown if spss_limits=True)
spss limit q_label:
length of q_label is greater than spss limit (256 characters) (only shown if spss_limits=True)
spss limit values:
length of any value text is greater than spss limit (120 characters) (only shown if spss_limits=True)
value_texts(name, text_key=None, axis_edit=None)

Get categorical data’s text information.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'].
  • text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
  • axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns:

texts – The list of category texts.

Return type:

list

values(name, text_key=None, axis_edit=None)

Get categorical data’s paired code and texts information from the meta.

Parameters:
  • name (str) – The column variable name keyed in _meta['columns'] or _meta['masks'].
  • text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
  • axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns:

values – The list of the numerical category codes and their texts packed as tuples.

Return type:

list of tuples

variables(setname='data file', numeric=True, string=True, date=True, boolean=True, blacklist=None)

View all DataSet variables listed in their global order.

Parameters:
  • setname (str, default 'data file') – The name of the variable set to query. Defaults to the main variable collection stored via ‘data file’.
  • numeric (bool, default True) – Include int and float type variables?
  • string (bool, default True) – Include string type variables?
  • date (bool, default True) – Include date type variables?
  • boolean (bool, default True) – Include boolean type variables?
  • blacklist (list, default None) – A list of variables names to exclude from the variable listing.
Returns:

varlist – The list of variables registered in the queried set.

Return type:

list

vmerge(dataset, on=None, left_on=None, right_on=None, row_id_name=None, left_id=None, right_id=None, row_ids=None, overwrite_text=False, from_set=None, uniquify_key=None, reset_index=True, inplace=True, text_properties=None, verbose=True)

Merge Quantipy datasets together by appending rows.

This function merges two Quantipy datasets together, updating variables that exist in the left dataset and appending others. New variables will be appended in the order indicated by the ‘data file’ set if found, otherwise they will be appended in alphanumeric order. This merge happens vertically (row-wise).

Parameters:
  • dataset ((A list of multiple) quantipy.DataSet) – One or multiple datasets to merge into the current DataSet.
  • on (str, default=None) – The column to use to identify unique rows in both datasets.
  • left_on (str, default=None) – The column to use to identify unique in the left dataset.
  • right_on (str, default=None) – The column to use to identify unique in the right dataset.
  • row_id_name (str, default=None) – The named column will be filled with the ids indicated for each dataset, as per left_id/right_id/row_ids. If meta for the named column doesn’t already exist a new column definition will be added and assigned a reductive-appropriate type.
  • left_id (str/int/float, default=None) – Where the row_id_name column is not already populated for the dataset_left, this value will be populated.
  • right_id (str/int/float, default=None) – Where the row_id_name column is not already populated for the dataset_right, this value will be populated.
  • row_ids (list of str/int/float, default=None) – When datasets has been used, this list provides the row ids that will be populated in the row_id_name column for each of those datasets, respectively.
  • overwrite_text (bool, default=False) – If True, text_keys in the left meta that also exist in right meta will be overwritten instead of ignored.
  • from_set (str, default=None) – Use a set defined in the right meta to control which columns are merged from the right dataset.
  • uniquify_key (str, default None) – A int-like column name found in all the passed DataSet objects that will be protected from having duplicates. The original version of the column will be kept under its name prefixed with ‘original’.
  • reset_index (bool, default=True) – If True pandas.DataFrame.reindex() will be applied to the merged dataframe.
  • inplace (bool, default True) – If True, the DataSet will be modified inplace with new/updated rows. Will return a new DataSet instance if False.
  • merge_existing (str/ list of str, default None, {'all', [var_names]}) – Merge values for defined delimited sets if it exists in both datasets. (update_existing is prioritized)
  • text_properties (str/ list of str, default=None, {'all', [var_names]}) – Controls the update of the dataset_left properties with properties from the dataset_right. If None, properties from dataset_left will be updated by the ones from the dataset_right. If ‘all’, properties from dataset_left will be kept unchanged. Otherwise, specify the list of properties which will be kept unchanged in the dataset_left; all others will be updated by the properties from dataset_right.
  • verbose (bool, default=True) – Echo progress feedback to the output pane.
Returns:

None or new_dataset – If the merge is not applied inplace, a DataSet instance is returned.

Return type:

quantipy.DataSet

weight(weight_scheme, weight_name='weight', unique_key='identity', subset=None, report=True, path_report=None, inplace=True, verbose=True)

Weight the DataSet according to a well-defined weight scheme.

Parameters:
  • weight_scheme (quantipy.Rim instance) – A rim weights setup with defined targets. Can include multiple weight groups and/or filters.
  • weight_name (str, default 'weight') – A name for the float variable that is added to pick up the weight factors.
  • unique_key (str, default 'identity'.) – A variable inside the DataSet instance that will be used to the map individual case weights to their matching rows.
  • subset (Quantipy complex logic expression) – A logic to filter the DataSet, weighting only the remaining subset.
  • report (bool, default True) – If True, will report a summary of the weight algorithm run and factor outcomes.
  • path_report (str, default None) – A file path to save an .xlsx version of the weight report to.
  • inplace (bool, default True) – If True, the weight factors are merged back into the DataSet instance. Will otherwise return the pandas.DataFrame that contains the weight factors, the unique_key and all variables that have been used to compute the weights (filters, target variables, etc.).
Returns:

Will either create a new column called 'weight' in the DataSet instance or return a DataFrame that contains the weight factors.

Return type:

None or pandas.DataFrame

write_dimensions(path_mdd=None, path_ddf=None, text_key=None, run=True, clean_up=True, CRLF='CR')

Build Dimensions/SPSS Base Professional .ddf/.mdd data pairs.

Note

SPSS Data Collection Base Professional must be installed on the machine. The method is creating .mrs and .dms scripts which are executed through the software’s API.

Parameters:
  • path_mdd (str, default None) – The full path (optionally with extension '.mdd', otherwise assumed as such) for the saved the DataSet._meta component. If not provided, the instance’s name and `path attributes will be used to determine the file location.
  • path_ddf (str, default None) – The full path (optionally with extension '.ddf', otherwise assumed as such) for the saved DataSet._data component. If not provided, the instance’s name and `path attributes will be used to determine the file location.
  • text_key (str, default None) – The desired text_key for all text label information. Uses the DataSet.text_key information if not provided.
  • run (bool, default True) – If True, the method will try to run the metadata creating .mrs script and execute a DMSRun for the case data transformation in the .dms file.
  • clean_up (bool, default True) – By default, all helper files from the conversion (.dms, .mrs, paired .csv files, etc.) will be deleted after the process has finished.
Returns:

Return type:

A .ddf/.mdd pair is saved at the provided path location.

write_quantipy(path_meta=None, path_data=None)

Write the data and meta components to .csv/.json files.

The resulting files are well-defined native Quantipy source files.

Parameters:
  • path_meta (str, default None) – The full path (optionally with extension '.json', otherwise assumed as such) for the saved the DataSet._meta component. If not provided, the instance’s name and `path attributes will be used to determine the file location.
  • path_data (str, default None) – The full path (optionally with extension '.csv', otherwise assumed as such) for the saved DataSet._data component. If not provided, the instance’s name and `path attributes will be used to determine the file location.
Returns:

Return type:

A .csv/.json pair is saved at the provided path location.

write_spss(path_sav=None, index=True, text_key=None, mrset_tag_style='__', drop_delimited=True, from_set=None, verbose=True)

Convert the Quantipy DataSet into a SPSS .sav data file.

Parameters:
  • path_sav (str, default None) – The full path (optionally with extension '.json', otherwise assumed as such) for the saved the DataSet._meta component. If not provided, the instance’s name and `path attributes will be used to determine the file location.
  • index (bool, default False) – Should the index be inserted into the dataframe before the conversion happens?
  • text_key (str, default None) – The text_key that should be used when taking labels from the source meta. If the given text_key is not found for any particular text object, the DataSet.text_key will be used instead.
  • mrset_tag_style (str, default '__') – The delimiting character/string to use when naming dichotomous set variables. The mrset_tag_style will appear between the name of the variable and the dichotomous variable’s value name, as taken from the delimited set value that dichotomous variable represents.
  • drop_delimited (bool, default True) – Should Quantipy’s delimited set variables be dropped from the export after being converted to dichotomous sets/mrsets?
  • from_set (str) – The set name from which the export should be drawn.
Returns:

Return type:

A SPSS .sav file is saved at the provided path location.