DataSet¶
-
class
quantipy.
DataSet
(name, dimensions_comp=True)¶ A set of casedata (required) and meta data (optional).
DESC.
-
add_filter_var
(name, logic, overwrite=False)¶ Create filter-var, that allows index slicing using
manifest_filter
Parameters: - name (str) – Name and label of the new filter-variable, which gets also listed in DataSet.filters
- logic (complex logic/ str, list of complex logic/ str) – Logic to keep cases.
Complex logic should be provided in form of:
` { 'label': 'any text', 'logic': {var: keys} / intersection/ .... } `
If a str (column-name) is provided, automatically a logic is created that keeps all cases which are not empty for this column. If logic is a list, each included list-item becomes a category of the new filter-variable and all cases are kept that satify all conditions (intersection) - overwrite (bool, default False) – Overwrite an already existing filter-variable.
-
add_meta
(name, qtype, label, categories=None, items=None, text_key=None, replace=True)¶ Create and insert a well-formed meta object into the existing meta document.
Parameters: - name (str) – The column variable name keyed in
meta['columns']
. - qtype ({'int', 'float', 'single', 'delimited set', 'date', 'string'}) – The structural type of the data the meta describes.
- label (str) – The
text
label information. - categories (list of str, int, or tuples in form of (int, str), default None) – When a list of str is given, the categorical values will simply be
enumerated and mapped to the category labels. If only int are
provided, text labels are assumed to be an empty str (‘’) and a
warning is triggered. Alternatively, codes can be mapped to categorical
labels, e.g.:
[(1, 'Elephant'), (2, 'Mouse'), (999, 'No animal')]
- items (list of str, int, or tuples in form of (int, str), default None) – If provided will automatically create an array type mask.
When a list of str is given, the item number will simply be
enumerated and mapped to the category labels. If only int are
provided, item text labels are assumed to be an empty str (‘’) and
a warning is triggered. Alternatively, numerical values can be
mapped explicitly to items labels, e.g.:
[(1 'The first item'), (2, 'The second item'), (99, 'Last item')]
- text_key (str, default None) – Text key for text-based label information. Uses the
DataSet.text_key
information if not provided. - replace (bool, default True) – If True, an already existing corresponding
pd.DataFrame
column in the case data component will be overwritten with a new (empty) one.
Returns: DataSet
is modified inplace, meta data and_data
columns will be addedReturn type: None
- name (str) – The column variable name keyed in
-
align_order
(vlist, align_against=None, integrate_rc=(['_rc', '_rb'], True), fix=[])¶ Align list to existing order.
Parameters: - vlist (list of str) – The list which should be reordered.
- align_against (str or list of str, default None) – The list of variables to align against. If a string is provided, the depending set list is taken. If None, “data file” set is taken.
- integrate_rc (tuple (list, bool)) – The provided list are the suffixes for recodes, the bool decides whether parent variables should be replaced by their recodes if the parent variable is not in vlist.
- fix (list of str) – Variables which are fixed at the beginning of the reordered list.
-
all
(name, codes)¶ Return a logical has_all() slicer for the passed codes.
Note
When applied to an array mask, the has_all() logic is ex- tended to the item sources, i.e. the it must itself be true for all the items.
Parameters: - name (str, default None) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - codes (int or list of int) – The codes to build the logical slicer from.
Returns: slicer – The indices fulfilling has_all([codes]).
Return type: pandas.Index
- name (str, default None) – The column variable name keyed in
-
any
(name, codes)¶ Return a logical has_any() slicer for the passed codes.
Note
When applied to an array mask, the has_any() logic is ex- tended to the item sources, i.e. the it must itself be true for at least one of the items.
Parameters: - name (str, default None) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - codes (int or list of int) – The codes to build the logical slicer from.
Returns: slicer – The indices fulfilling has_any([codes]).
Return type: pandas.Index
- name (str, default None) – The column variable name keyed in
-
band
(name, bands, new_name=None, label=None, text_key=None)¶ Group numeric data with band definitions treated as group text labels.
Wrapper around
derive()
for quick banding of numeric data.Parameters: - name (str) – The column variable name keyed in
_meta['columns']
that will be banded into summarized categories. - bands (list of int/tuple or dict mapping the former to value texts) – The categorical bands to be used. Bands can be single numeric
values or ranges, e.g.: [0, (1, 10), 11, 12, (13, 20)].
Be default, each band will also make up the value text of the
category created in the
_meta
component. To specify custom texts, map each band to a category name e.g.: [{‘A’: 0}, {‘B’: (1, 10)}, {‘C’: 11}, {‘D’: 12}, {‘E’: (13, 20)}] - new_name (str, default None) – The created variable will be named
'<name>_banded'
, unless a desired name is provided explicitly here. - label (str, default None) – The created variable’s text label will be identical to the origi-
nating one’s passed in
name
, unless a desired label is provided explicitly here. - text_key (str, default None) – Text key for text-based label information. Uses the
DataSet.text_key
information if not provided.
Returns: DataSet
is modified inplace.Return type: None
- name (str) – The column variable name keyed in
-
by_type
(types=None)¶ Get an overview of all the variables ordered by their type.
Parameters: types (str or list of str, default None) – Restrict the overview to these data types. Returns: overview – The variables per data type inside the DataSet
.Return type: pandas.DataFrame
-
categorize
(name, categorized_name=None)¶ Categorize an
int
/string
/text
variable tosingle
.The
values
object of the categorized variable is populated with the unique values found in the originating variable (ignoring np.NaN / empty row entries).Parameters: - name (str) – The column variable name keyed in
meta['columns']
that will be categorized. - categorized_name (str) – If provided, the categorized variable’s new name will be drawn
from here, otherwise a default name in form of
'name#'
will be used.
Returns: DataSet is modified inplace, adding the categorized variable to it.
Return type: None
- name (str) – The column variable name keyed in
-
clear_factors
(name)¶ Remove all factors set in the variable’s
'values'
object.Parameters: name (str) – The column variable name keyed in _meta['columns']
or_meta['masks']
.Returns: Return type: None
-
clone
()¶ Get a deep copy of the
DataSet
instance.
-
code_count
(name, count_only=None, count_not=None)¶ Get the total number of codes/entries found per row.
Note
Will be 0/1 for type
single
and range between 0 and the number of possible values for typedelimited set
.Parameters: - name (str) – The column variable name keyed in
meta['columns']
ormeta['masks']
. - count_only (int or list of int, default None) – Pass a list of codes to restrict counting to.
- count_not (int or list of int, default None) – Pass a list of codes that should no be counted.
Returns: count – A series with the results as ints.
Return type: pandas.Series
- name (str) – The column variable name keyed in
-
code_from_label
(name, text_label, text_key=None, exact=True, flat=True)¶ Return the code belonging to the passed
text
label (if present).Parameters: - name (str) – The originating variable name keyed in
meta['columns']
ormeta['masks']
. - text_label (str or list of str) – The value text(s) to search for.
- text_key (str, default None) – The desired
text_key
to search through. Uses theDataSet.text_key
information if not provided. - exact (bool, default True) –
text_label
must exactly match a categorical value’stext
. If False, it is enough that the category contains thetext_label
. - flat (If a list is passed for
text_label
, return all found codes) – as a regular list. If False, return a list of lists matching the order of thetext_label
list.
Returns: codes – The list of value codes found for the passed label
text
.Return type: list
- name (str) – The originating variable name keyed in
-
codes
(name)¶ Get categorical data’s numerical code values.
Parameters: name (str) – The column variable name keyed in _meta['columns']
.Returns: codes – The list of category codes. Return type: list
-
codes_in_data
(name)¶ Get a list of codes that exist in data.
-
compare
(dataset, variables=None, strict=False, text_key=None)¶ Compares types, codes, values, question labels of two datasets.
Parameters: - dataset (quantipy.DataSet instance) – Test if all variables in the provided
dataset
are also inself
and compare their metadata definitions. - variables (str, list of str) – Check only these variables
- strict (bool, default False) – If True lower/ upper cases and spaces are taken into account.
- text_key (str, list of str) – The textkeys for which texts are compared.
Returns: Return type: None
- dataset (quantipy.DataSet instance) – Test if all variables in the provided
-
compare_filter
(name1, name2)¶ Show if filters result in the same index.
Parameters: - name1 (str) – Name of the first filter variable
- name2 (str/ list of st) – Name(s) of the filter variable(s) to compare with.
-
convert
(name, to)¶ Convert meta and case data between compatible variable types.
Wrapper around the separate
as_TYPE()
conversion methods.Parameters: - name (str) – The column variable name keyed in
meta['columns']
that will be converted. - to ({'int', 'float', 'single', 'delimited set', 'string'}) – The variable type to convert to.
Returns: The DataSet variable is modified inplace.
Return type: None
- name (str) – The column variable name keyed in
-
copy
(name, suffix='rec', copy_data=True, slicer=None, copy_only=None, copy_not=None)¶ Copy meta and case data of the variable defintion given per
name
.Parameters: - name (str) – The originating column variable name keyed in
meta['columns']
ormeta['masks']
. - suffix (str, default 'rec') – The new variable name will be constructed by suffixing the original
name
with_suffix
, e.g.'age_rec
. - copy_data (bool, default True) – The new variable assumes the
data
of the original variable. - slicer (dict) – If the data is copied it is possible to filter the data with a complex logic. Example: slicer = {‘q1’: not_any([99])}
- copy_only (int or list of int, default None) – If provided, the copied version of the variable will only contain (data and) meta for the specified codes.
- copy_not (int or list of int, default None) – If provided, the copied version of the variable will contain (data and) meta for the all codes, except of the indicated.
Returns: DataSet is modified inplace, adding a copy to both the data and meta component.
Return type: None
- name (str) – The originating column variable name keyed in
-
copy_array_data
(source, target, source_items=None, target_items=None, slicer=None)¶
-
create_set
(setname='new_set', based_on='data file', included=None, excluded=None, strings='keep', arrays='masks', replace=None, overwrite=False)¶ Create a new set in
dataset._meta['sets']
.Parameters: - setname (str, default 'new_set') – Name of the new set.
- based_on (str, default 'data file') – Name of set that can be reduced or expanded.
- included (str or list/set/tuple of str) – Names of the variables to be included in the new set. If None all
variables in
based_on
are taken. - excluded (str or list/set/tuple of str) – Names of the variables to be excluded in the new set.
- strings ({'keep', 'drop', 'only'}, default 'keep') – Keep, drop or only include string variables.
- arrays ({'masks', 'columns'}, default masks) – For arrays add
masks@varname
orcolumns@varname
. - replace (dict) – Replace a variable in the set with an other.
Example: {‘q1’: ‘q1_rec’}, ‘q1’ and ‘q1_rec’ must be included in
based_on
. ‘q1’ will be removed and ‘q1_rec’ will be moved to this position. - overwrite (bool, default False) – Overwrite if
meta['sets'][name]
already exist.
Returns: The
DataSet
is modified inplace.Return type: None
-
crosstab
(x, y=None, w=None, pct=False, decimals=1, text=True, rules=False, xtotal=False, f=None)¶
-
cut_item_texts
(arrays=None)¶ Remove array text from array item texts.
Parameters: arrays (str, list of str, default None) – Cut texts for items of these arrays. If None, all keys in ._meta['masks']
are taken.
-
data
()¶ Return the
data
component of theDataSet
instance.
-
derive
(name, qtype, label, cond_map, text_key=None)¶ Create meta and recode case data by specifying derived category logics.
Parameters: - name (str) – The column variable name keyed in
meta['columns']
. - qtype ([
int
,float
,single
,delimited set
]) – The structural type of the data the meta describes. - label (str) – The
text
label information. - cond_map (list of tuples) –
Tuples of either two or three elements of following structures:
2 elements, no labels provided: (code, <qp logic expression here>), e.g.:
(1, intersection([{'gender': [1]}, {'age': frange('30-40')}]))
2 elements, no codes provided: (‘text label’, <qp logic expression here>), e.g.:
('Cat 1', intersection([{'gender': [1]}, {'age': frange('30-40')}]))
3 elements, with codes + labels: (code, ‘Label goes here’, <qp logic expression here>), e.g.:
(1, 'Men, 30 to 40', intersection([{'gender': [1]}, {'age': frange('30-40')}]))
- text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
Returns: DataSet
is modified inplace.Return type: None
- name (str) – The column variable name keyed in
-
derotate
(levels, mapper, other=None, unique_key='identity', dropna=True)¶ Derotate data and meta using the given mapper, and appending others.
This function derotates data using the specification defined in mapper, which is a list of dicts of lists, describing how columns from data can be read as a heirarchical structure.
Returns derotated DataSet instance and saves data and meta as json and csv.
Parameters: - levels (dict) – The name and values of a new column variable to identify cases.
- mapper (list of dicts of lists) –
A list of dicts matching where the new column names are keys to to lists of source columns. Example:
>>> mapper = [{'q14_1': ['q14_1_1', 'q14_1_2', 'q14_1_3']}, ... {'q14_2': ['q14_2_1', 'q14_2_2', 'q14_2_3']}, ... {'q14_3': ['q14_3_1', 'q14_3_2', 'q14_3_3']}]
- unique_key (str) – Name of column variable that will be copied to new dataset.
- other (list (optional; default=None)) – A list of additional columns from the source data to be appended to the end of the resulting stacked dataframe.
- dropna (boolean (optional; default=True)) – Passed through to the pandas.DataFrame.stack() operation.
Returns: Return type: new
qp.DataSet
instance
-
describe
(var=None, only_type=None, text_key=None, axis_edit=None)¶ Inspect the DataSet’s global or variable level structure.
-
dichotomize
(name, value_texts=None, keep_variable_text=True, ignore=None, replace=False, text_key=None)¶
-
dimensionize
(names=None)¶ Rename the dataset columns for Dimensions compatibility.
-
dimensionizing_mapper
(names=None)¶ Return a renaming dataset mapper for dimensionizing names.
Parameters: None – Returns: mapper – A renaming mapper in the form of a dict of {old: new} that maps non-Dimensions naming conventions to Dimensions naming conventions. Return type: dict
-
drop
(name, ignore_items=False)¶ Drops variables from meta and data components of the
DataSet
.Parameters: - name (str or list of str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - ignore_items (bool) – If False source variables for arrays in
_meta['columns']
are dropped, otherwise kept.
Returns: DataSet is modified inplace.
Return type: None
- name (str or list of str) – The column variable name keyed in
-
drop_duplicates
(unique_id='identity', keep='first', sort_by=None)¶ Drop duplicated cases from self._data.
Parameters: - unique_id (str) – Variable name that gets scanned for duplicates.
- keep (str, {'first', 'last'}) – Keep first or last of the duplicates.
- sort_by (str) – Name of a variable to sort the data by, for example “endtime”. It is a helper to specify keep.
-
duplicates
(name='identity')¶ Returns a list with duplicated values for the provided name.
Parameters: name (str, default 'identity') – The column variable name keyed in meta['columns']
.Returns: vals – A list of duplicated values found in the named variable. Return type: list
-
empty
(name, condition=None)¶ Check variables for emptiness (opt. restricted by a condition).
Parameters: - name ((list of) str) – The mask variable name keyed in
_meta['columns']
. - condition (Quantipy logic expression, default None) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be considered.
Returns: empty
Return type: bool
- name ((list of) str) – The mask variable name keyed in
-
empty_items
(name, condition=None, by_name=True)¶ Test arrays for item emptiness (opt. restricted by a condition).
Parameters: - name ((list of) str) – The mask variable name keyed in
_meta['masks']
. - condition (Quantipy logic expression, default None) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be considered.
- by_name (bool, default True) – Return array items by their name or their index.
Returns: empty – The list of empty items by their source names or positional index (starting from 1!, mapped to their parent mask name if more than one).
Return type: list
- name ((list of) str) – The mask variable name keyed in
-
extend_filter_var
(name, logic, extend_as=None)¶ Extend logic of an existing filter-variable.
Parameters: - name (str) – Name of the existing filter variable.
- logic ((list of) complex logic/ str) – Additional logic to keep cases (intersection with existing logic).
Complex logic should be provided in form of:
` { 'label': 'any text', 'logic': {var: keys} / intersection/ .... } `
- extend_as (str, default None) – Addition to the filter-name to create a new filter. If it is None the existing filter-variable is overwritten.
-
extend_items
(name, ext_items, text_key=None)¶ Extend mask items of an existing array.
Parameters: - name (str) – The originating column variable name keyed in
meta['masks']
. - ext_items (list of str/ list of dict) – The label of the new item. It can be provided as str, then the new column is named by the grid and the item_no, or as dict {‘new_column’: ‘label’}.
- text_key (str/ list of str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
- name (str) – The originating column variable name keyed in
-
extend_values
(name, ext_values, text_key=None, safe=True)¶ Add to the ‘values’ object of existing column or mask meta data.
Attempting to add already existing value codes or providing already present value texts will both raise a
ValueError
!Parameters: - name (str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - ext_values (list of str or tuples in form of (int, str), default None) – When a list of str is given, the categorical values will simply be enumerated and mapped to the category labels. Alternatively codes can mapped to categorical labels, e.g.: [(1, ‘Elephant’), (2, ‘Mouse’), (999, ‘No animal’)]
- text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
- safe (bool, default True) – If set to False, duplicate value texts are allowed when extending
the
values
object.
Returns: The
DataSet
is modified inplace.Return type: None
- name (str) – The column variable name keyed in
-
factors
(name)¶ Get categorical data’s stat. factor values.
Parameters: name (str) – The column variable name keyed in _meta['columns']
or_meta['masks']
.Returns: factors – A {value: factor}
mapping.Return type: OrderedDict
-
filter
(alias, condition, inplace=False)¶ Filter the DataSet using a Quantipy logical expression.
-
find
(str_tags=None, suffixed=False)¶ Find variables by searching their names for substrings.
Parameters: - str_tags ((list of) str) – The strings tags to look for in the variable names. If not provided, the modules’ default global list of substrings from VAR_SUFFIXES will be used.
- suffixed (bool, default False) – If set to True, only variable names that end with a given string sequence will qualify.
Returns: found – The list of matching variable names.
Return type: list
-
find_duplicate_texts
(name, text_key=None)¶ Collect values that share the same text information to find duplicates.
Parameters: - name (str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - text_key (str, default None) – Text key for text-based label information. Will automatically fall
back to the instance’s
text_key
property information if not provided.
- name (str) – The column variable name keyed in
-
first_responses
(name, n=3, others='others', reduce_values=False)¶ Create n-first mentions from the set of responses of a delimited set.
Parameters: - name (str) – The column variable name of a delimited set keyed in
meta['columns']
. - n (int, default 3) – The number of mentions that will be turned into single-type variables, i.e. 1st mention, 2nd mention, 3rd mention, 4th mention, etc.
- others (None or str, default 'others') – If provided, all remaining values will end up in a new delimited set variable reduced by the responses transferred to the single mention variables.
- reduce_values (bool, default False) – If True, each new variable will only list the categorical value metadata for the codes found in the respective data vector, i.e. not the initial full codeframe.
Returns: DataSet is modified inplace.
Return type: None
- name (str) – The column variable name of a delimited set keyed in
-
flatten
(name, codes, new_name=None, text_key=None)¶ Create a variable that groups array mask item answers to categories.
Parameters: - name (str) – The array variable name keyed in
meta['masks']
that will be converted. - codes (int, list of int) – The answers codes that determine the categorical grouping. Item labels will become the category labels.
- new_name (str, default None) – The name of the new delimited set variable. If None,
name
is suffixed with ‘_rec’. - text_key (str, default None) – Text key for text-based label information. Uses the
DataSet.text_key
information if not provided.
Returns: The DataSet is modified inplace, delimited set variable is added.
Return type: None
- name (str) – The array variable name keyed in
-
force_texts
(copy_to=None, copy_from=None, update_existing=False)¶ Copy info from existing text_key to a new one or update the existing one.
Parameters: - copy_to (str) – {‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} None -> _meta[‘lib’][‘default text’] The text key that will be filled.
- copy_from (str / list) – {‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} You can also enter a list with text_keys, if the first text_key doesn’t exist, it takes the next one
- update_existing (bool) – True : copy_to will be filled in any case False: copy_to will be filled if it’s empty/not existing
Returns: Return type: None
-
from_batch
(batch_name, include='identity', text_key=[], apply_edits=True, additions='variables')¶ Get a filtered subset of the DataSet using qp.Batch definitions.
Parameters: - batch_name (str) – Name of a Batch included in the DataSet.
- include (str/ list of str) – Name of variables that get included even if they are not in Batch.
- text_key (str/ list of str, default None) – Take over all texts of the included text_key(s), if None is provided all included text_keys are taken.
- apply_edits (bool, default True) – meta_edits and rules are used as/ applied on global meta of the new DataSet instance.
- additions ({'variables', 'filters', 'full', None}) – Extend included variables by the xks, yks and weights of the
additional batches if set to ‘variables’, ‘filters’ will create
new 1/0-coded variables that reflect any filters defined. Selecting
‘full’ will do both,
None
will ignore additional Batches completely.
Returns: b_ds
Return type: quantipy.DataSet
-
from_components
(data_df, meta_dict=None, reset=True, text_key=None)¶ Attach data and meta directly to the
DataSet
instance.Note
Except testing for appropriate object types, this method offers no additional safeguards or consistency/compability checks with regard to the passed data and meta documents!
Parameters: - data_df (pandas.DataFrame) – A DataFrame that contains case data entries for the
DataSet
. - meta_dict (dict, default None) – A dict that stores meta data describing the columns of the data_df. It is assumed to be well-formed following the Quantipy meta data structure.
- reset (bool, default True) – Clean the ‘lib’ and
'sets'
metadata collections from non-native entries, e.g. user-defined information or helper metadata. - text_key (str, default None) – The text_key to be used. If not provided, it will be attempted to
use the ‘default text’ from the
meta['lib']
definition.
Returns: Return type: None
- data_df (pandas.DataFrame) – A DataFrame that contains case data entries for the
-
from_excel
(path_xlsx, merge=True, unique_key='identity')¶ Converts excel files to a dataset or/and merges variables.
Parameters: - path_xlsx (str) – Path where the excel file is stored. The file must have exactly one sheet with data.
- merge (bool) – If True the new data from the excel file will be merged on the dataset.
- unique_key (str) – If
merge=True
an hmerge is done on this variable.
Returns: new_dataset – Contains only the data from excel. If
merge=True
dataset is modified inplace.Return type: quantipy.DataSet
-
from_stack
(stack, data_key=None, dk_filter=None, reset=True)¶ Use
quantipy.Stack
data and meta to create aDataSet
instance.Parameters: - stack (quantipy.Stack) – The Stack instance to convert.
- data_key (str) – The reference name where meta and data information are stored.
- dk_filter (string, default None) – Filter name if the stack contains more than one filters. If None ‘no_filter’ will be used.
- reset (bool, default True) – Clean the ‘lib’ and
'sets'
metadata collections from non-native entries, e.g. user-defined information or helper metadata.
Returns: Return type: None
Get all array definitions that contain only hidden items.
Returns: hidden – The list of array mask names. Return type: list
-
get_batch
(name)¶ Get existing Batch instance from DataSet meta information.
Parameters: name (str) – Name of existing Batch instance.
-
get_property
(name, prop_name, text_key=None)¶
-
hide_empty_items
(condition=None, arrays=None)¶ Apply
rules
meta to automatically hide empty array items.Parameters: - name ((list of) str, default None) – The array mask variable names keyed in
_meta['masks']
. If not explicitly provided will test all array mask definitions. - condition (Quantipy logic expression) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be considered.
Returns: Return type: None
- name ((list of) str, default None) – The array mask variable names keyed in
-
hiding
(name, hide, axis='y', hide_values=True)¶ Set or update
rules[axis]['dropx']
meta for the named column.Quantipy builds will respect the hidden codes and cut them from results.
Note
This is not equivalent to
DataSet.set_missings()
as missing values are respected also in computations.Parameters: - name (str or list of str) – The column variable(s) name keyed in
_meta['columns']
. - hide (int or list of int) – Values indicated by their
int
codes will be dropped fromQuantipy.View.dataframe
s. - axis ({'x', 'y'}, default 'y') – The axis to drop the values from.
- hide_values (bool, default True) – Only considered if
name
refers to a mask. If True, values are hidden on all mask items. If False, mask items are hidden by position (only for array summaries).
Returns: Return type: None
- name (str or list of str) – The column variable(s) name keyed in
-
hmerge
(dataset, on=None, left_on=None, right_on=None, overwrite_text=False, from_set=None, inplace=True, update_existing=None, merge_existing=None, text_properties=None, verbose=True)¶ Merge Quantipy datasets together using an index-wise identifer.
This function merges two Quantipy datasets together, updating variables that exist in the left dataset and appending others. New variables will be appended in the order indicated by the ‘data file’ set if found, otherwise they will be appended in alphanumeric order. This merge happend horizontally (column-wise). Packed kwargs will be passed on to the pandas.DataFrame.merge() method call, but that merge will always happen using how=’left’.
Parameters: - dataset (
quantipy.DataSet
) – The dataset to merge into the currentDataSet
. - on (str, default=None) – The column to use as a join key for both datasets.
- left_on (str, default=None) – The column to use as a join key for the left dataset.
- right_on (str, default=None) – The column to use as a join key for the right dataset.
- overwrite_text (bool, default=False) – If True, text_keys in the left meta that also exist in right meta will be overwritten instead of ignored.
- from_set (str, default=None) – Use a set defined in the right meta to control which columns are merged from the right dataset.
- inplace (bool, default True) – If True, the
DataSet
will be modified inplace with new/updated columns. Will return a newDataSet
instance if False. - update_existing (str/ list of str, default None, {'all', [var_names]}) – Update values for defined delimited sets if it exists in both datasets.
- text_properties (str/ list of str, default=None, {'all', [var_names]}) – Controls the update of the dataset_left properties with properties from the dataset_right. If None, properties from dataset_left will be updated by the ones from the dataset_right. If ‘all’, properties from dataset_left will be kept unchanged. Otherwise, specify the list of properties which will be kept unchanged in the dataset_left; all others will be updated by the properties from dataset_right.
- verbose (bool, default=True) – Echo progress feedback to the output pane.
Returns: None or new_dataset – If the merge is not applied
inplace
, aDataSet
instance is returned.Return type: quantipy.DataSet
- dataset (
-
interlock
(name, label, variables, val_text_sep='/')¶ Build a new category-intersected variable from >=2 incoming variables.
Parameters: - name (str) – The new column variable name keyed in
_meta['columns']
. - label (str) – The new text label for the created variable.
- variables (list of >= 2 str or dict (mapper)) –
The column names of the variables that are feeding into the intersecting recode operation. Or dicts/mapper to create temporary variables for interlock. Can also be a mix of str and dict. Example:
>>> ['gender', ... {'agegrp': [(1, '18-34', {'age': frange('18-34')}), ... (2, '35-54', {'age': frange('35-54')}), ... (3, '55+', {'age': is_ge(55)})]}, ... 'region']
- val_text_sep (str, default '/') – The passed character (or any other str value) wil be used to separate the incoming individual value texts to make up the inter- sected category value texts, e.g.: ‘Female/18-30/London’.
Returns: Return type: None
- name (str) – The new column variable name keyed in
-
is_like_numeric
(name)¶ Test if a
string
-typed variable can be expressed numerically.Parameters: name (str) – The column variable name keyed in _meta['columns']
.Returns: Return type: bool
-
is_nan
(name)¶ Detect empty entries in the
_data
rows.Parameters: name (str) – The column variable name keyed in meta['columns']
.Returns: count – A series with the results as bool. Return type: pandas.Series
-
is_subfilter
(name1, name2)¶ Verify if index of name2 is part of the index of name1.
-
item_no
(name)¶ Return the order/position number of passed array item variable name.
Parameters: name (str) – The column variable name keyed in _meta['columns']
.Returns: no – The positional index of the item (starting from 1). Return type: int
-
item_texts
(name, text_key=None, axis_edit=None)¶ Get the
text
meta data for the items of the passed array mask name.Parameters: - name (str) – The mask variable name keyed in
_meta['masks']
. - text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
- axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns: texts – The list of item texts for the array elements.
Return type: list
- name (str) – The mask variable name keyed in
-
items
(name, text_key=None, axis_edit=None)¶ Get the array’s paired item names and texts information from the meta.
Parameters: - name (str) – The column variable name keyed in
_meta['masks']
. - text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
- axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns: items – The list of source item names (from
_meta['columns']
) and theirtext
information packed as tuples.Return type: list of tuples
- name (str) – The column variable name keyed in
-
link
(filters=None, x=None, y=None, views=None)¶ Create a Link instance from the DataSet.
-
manifest_filter
(name)¶ Get index slicer from filter-variables.
Parameters: name (str) – Name of the filter_variable.
-
merge_texts
(dataset)¶ Add additional
text
versions from othertext_key
meta.Case data will be ignored during the merging process.
Parameters: dataset ((A list of multiple) quantipy.DataSet
) – One or multiple datasets that provide newtext_key
meta.Returns: Return type: None
-
meta
(name=None, text_key=None, axis_edit=None)¶ Provide a pretty summary for variable meta given as per
name
.Parameters: - name (str, default None) – The variable name keyed in
_meta['columns']
or_meta['masks']
. If None, the entiremeta
component of theDataSet
instance will be returned. - text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
- axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns: meta – Either a DataFrame that sums up the meta information on a
mask
orcolumn
or the meta dict as a whole isReturn type: dict or pandas.DataFrame
- name (str, default None) – The variable name keyed in
-
meta_to_json
(key=None, collection=None)¶ Save a meta object as json file.
Parameters: - key (str, default None) – Name of the variable whose metadata is saved, if key is not provided included collection or the whole meta is saved.
- collection (str {'columns', 'masks', 'sets', 'lib'}, default None) – The meta object is taken from this collection.
Returns: Return type: None
-
min_value_count
(name, min=50, weight=None, condition=None, axis='y', verbose=True)¶ Wrapper for self.hiding(), which is hiding low value_counts.
Parameters: - variables (str/ list of str) – Name(s) of the variable(s) whose values are checked against the defined border.
- min (int) – If the amount of counts for a value is below this number, the value is hidden.
- weight (str, default None) – Name of the weight, which is used to calculate the weigthed counts.
- condition (complex logic) – The data, which is used to calculate the counts, can be filtered by the included condition.
- axis ({'y', 'x', ['x', 'y']}, default None) – The axis on which the values are hidden.
-
names
(ignore_items=True)¶ Find all weak-duplicate variable names that are different only by case.
Note
Will return self.variables() if no weak-duplicates are found.
Returns: weak_dupes – An overview of case-sensitive spelling differences in otherwise equal variable names. Return type: pd.DataFrame
-
order
(new_order=None, reposition=None, regroup=False)¶ Set the global order of the DataSet variables collection.
The global order of the DataSet is reflected in the data component’s pd.DataFrame.columns order and the variable references in the meta component’s ‘data file’ items.
Parameters: - new_order (list) – A list of all DataSet variables in the desired order.
- reposition ((List of) dict) – Each dict maps one or a list of variables to a reference variable name key. The mapped variables are moved before the reference key.
- regroup (bool, default False) – Attempt to regroup non-native variables (i.e. created either
manually with
add_meta()
,recode()
,derive()
, etc. or automatically by manifestingqp.View
objects) with their originating variables.
Returns: Return type: None
-
parents
(name)¶ Get the
parent
meta information for masks-structured column elements.Parameters: name (str) – The mask variable name keyed in _meta['columns']
.Returns: parents – The list of parents the _meta['columns']
variable is attached to.Return type: list
-
populate
(batches='all', verbose=True)¶ Create a
qp.Stack
based on all availableqp.Batch
definitions.Parameters: batches (str/ list of str) – Name(s) of qp.Batch
instances that are used to populate theqp.Stack
.Returns: Return type: qp.Stack
-
read_ascribe
(path_meta, path_data, text_key)¶ Load Dimensions .xml/.txt files, connecting as data and meta components.
Parameters: - path_meta (str) – The full path (optionally with extension
'.xml'
, otherwise assumed as such) to the meta data defining'.xml'
file. - path_data (str) – The full path (optionally with extension
'.txt'
, otherwise assumed as such) to the case data defining'.txt'
file.
Returns: The
DataSet
is modified inplace, connected to Quantipy data and meta components that have been converted from their Ascribe source files.Return type: None
- path_meta (str) – The full path (optionally with extension
-
read_dimensions
(path_meta, path_data)¶ Load Dimensions .ddf/.mdd files, connecting as data and meta components.
Parameters: - path_meta (str) – The full path (optionally with extension
'.mdd'
, otherwise assumed as such) to the meta data defining'.mdd'
file. - path_data (str) – The full path (optionally with extension
'.ddf'
, otherwise assumed as such) to the case data defining'.ddf'
file.
Returns: The
DataSet
is modified inplace, connected to Quantipy data and meta components that have been converted from their Dimensions source files.Return type: None
- path_meta (str) – The full path (optionally with extension
-
read_quantipy
(path_meta, path_data, reset=True)¶ Load Quantipy .csv/.json files, connecting as data and meta components.
Parameters: - path_meta (str) – The full path (optionally with extension
'.json'
, otherwise assumed as such) to the meta data defining'.json'
file. - path_data (str) – The full path (optionally with extension
'.csv'
, otherwise assumed as such) to the case data defining'.csv'
file. - reset (bool, default True) – Clean the ‘lib’ and
'sets'
metadata collections from non-native entries, e.g. user-defined information or helper metadata.
Returns: The
DataSet
is modified inplace, connected to Quantipy native data and meta components.Return type: None
- path_meta (str) – The full path (optionally with extension
-
read_spss
(path_sav, **kwargs)¶ Load SPSS Statistics .sav files, converting and connecting data/meta.
Parameters: path_sav (str) – The full path (optionally with extension '.sav'
, otherwise assumed as such) to the'.sav'
file.Returns: The DataSet
is modified inplace, connected to Quantipy data and meta components that have been converted from the SPSS source file.Return type: None
-
recode
(target, mapper, default=None, append=False, intersect=None, initialize=None, fillna=None, inplace=True)¶ Create a new or copied series from data, recoded using a mapper.
This function takes a mapper of {key: logic} entries and injects the key into the target column where its paired logic is True. The logic may be arbitrarily complex and may refer to any other variable or variables in data. Where a pre-existing column has been used to start the recode, the injected values can replace or be appended to any data found there to begin with. Note that this function does not edit the target column, it returns a recoded copy of the target column. The recoded data will always comply with the column type indicated for the target column according to the meta.
Parameters: - target (str) – The column variable name keyed in
_meta['columns']
that is the target of the recode. If not found in_meta
this will fail with an error. Iftarget
is not found in data.columns the recode will start from an empty series with the same index as_data
. Iftarget
is found in data.columns the recode will start from a copy of that column. - mapper (dict) – A mapper of {key: logic} entries.
- default (str, default None) – The column name to default to in cases where unattended lists are given in your logic, where an auto-transformation of {key: list} to {key: {default: list}} is provided. Note that lists in logical statements are themselves a form of shorthand and this will ultimately be interpreted as: {key: {default: has_any(list)}}.
- append (bool, default False) – Should the new recoded data be appended to values already found in the series? If False, data from series (where found) will overwrite whatever was found for that item instead.
- intersect (logical statement, default None) – If a logical statement is given here then it will be used as an implied intersection of all logical conditions given in the mapper.
- initialize (str or np.NaN, default None) – If not None, a copy of the data named column will be used to populate the target column before the recode is performed. Alternatively, initialize can be used to populate the target column with np.NaNs (overwriting whatever may be there) prior to the recode.
- fillna (int, default=None) – If not None, the value passed to fillna will be used on the recoded series as per pandas.Series.fillna().
- inplace (bool, default True) – If True, the
DataSet
will be modified inplace with new/updated columns. Will return a new recodedpandas.Series
instance if False.
Returns: Either the
DataSet._data
is modified inplace or a newpandas.Series
is returned.Return type: None or recode_series
- target (str) – The column variable name keyed in
-
reduce_filter_var
(name, values)¶ Remove values from filter-variables and recalculate the filter.
-
remove_html
()¶ Cycle through all meta
text
objects removing html tags.Currently uses the regular expression ‘<.*?>’ in _remove_html() classmethod.
Returns: Return type: None
-
remove_items
(name, remove)¶ Erase array mask items safely from both meta and case data components.
Parameters: - name (str) – The originating column variable name keyed in
meta['masks']
. - remove (int or list of int) – The items listed by their order number in the
_meta['masks'][name]['items']
object will be droped from themask
definition.
Returns: DataSet is modified inplace.
Return type: None
- name (str) – The originating column variable name keyed in
-
remove_values
(name, remove)¶ Erase value codes safely from both meta and case data components.
Attempting to remove all value codes from the variable’s value object will raise a
ValueError
!Parameters: - name (str) – The originating column variable name keyed in
meta['columns']
ormeta['masks']
. - remove (int or list of int) – The codes to be removed from the
DataSet
variable.
Returns: DataSet is modified inplace.
Return type: None
- name (str) – The originating column variable name keyed in
-
rename
(name, new_name)¶ Change meta and data column name references of the variable defintion.
Parameters: - name (str) – The originating column variable name keyed in
meta['columns']
ormeta['masks']
. - new_name (str) – The new variable name.
Returns: DataSet is modified inplace. The new name reference replaces the original one.
Return type: None
- name (str) – The originating column variable name keyed in
-
rename_from_mapper
(mapper, keep_original=False, ignore_batch_props=False)¶ Rename meta objects and data columns using mapper.
Parameters: mapper (dict) – A renaming mapper in the form of a dict of {old: new} that will be used to rename columns throughout the meta and data. Returns: DataSet is modified inplace. Return type: None
-
reorder_items
(name, new_order)¶ Apply a new order to mask items.
Parameters: - name (str) – The variable name keyed in
_meta['masks']
. - new_order (list of int, default None) – The new order of the mask items. The included ints match up to
the number of the items (
DataSet.item_no('item_name')
).
Returns: DataSet is modified inplace.
Return type: None
- name (str) – The variable name keyed in
-
reorder_values
(name, new_order=None)¶ Apply a new order to the value codes defined by the meta data component.
Parameters: - name (str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - new_order (list of int, default None) – The new code order of the DataSet variable. If no order is given,
the
values
object is sorted ascending.
Returns: DataSet is modified inplace.
Return type: None
- name (str) – The column variable name keyed in
-
repair
()¶ Try to fix legacy meta data inconsistencies and badly shaped array / datafile items
'sets'
meta definitions.
-
repair_text_edits
(text_key=None)¶ Cycle through all meta
text
objects repairing axis edits.Parameters: text_key (str / list of str, default None) – {None, ‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} The text_keys for which text edits should be included. Returns: Return type: None
-
replace_texts
(replace, text_key=None)¶ Cycle through all meta
text
objects replacing unwanted strings.Parameters: - replace (dict, default Nonea) – A dictionary mapping {unwanted string: replacement string}.
- text_key (str / list of str, default None) – {None, ‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} The text_keys for which unwanted strings are replaced.
Returns: Return type: None
-
resolve_name
(name)¶
-
restore_item_texts
(arrays=None)¶ Restore array item texts.
Parameters: arrays (str, list of str, default None) – Restore texts for items of these arrays. If None, all keys in ._meta['masks']
are taken.
-
revert
()¶ Return to a previously saved state of the DataSet.
Note
This method is designed primarily for use in interactive Python environments like iPython/Jupyter and their notebook applications.
-
roll_up
(varlist, ignore_arrays=None)¶ Replace any array items with their parent mask variable definition name.
Parameters: - varlist (list) – A list of meta
'columns'
and/or'masks'
names. - ignore_arrays ((list of) str) – A list of array mask names that should not be rolled up if their
items are found inside
varlist
.
Note
varlist can also contain nesting var1 > var2. The variables which are included in the nesting can also be controlled by keep and both, even if the variables are also included as a “normal” variable.
Returns: rolled_up – The modified varlist
.Return type: list - varlist (list) – A list of meta
-
save
()¶ Save the current state of the DataSet’s data and meta.
The saved file will be temporarily stored inside the cache. Use this to take a snapshot of the DataSet state to easily revert back to at a later stage.
Note
This method is designed primarily for use in interactive Python environments like iPython/Jupyter notebook applications.
-
select_text_keys
(text_key=None)¶ Cycle through all meta
text
objects keep only selected text_key.Parameters: text_key (str / list of str, default None) – {None, ‘en-GB’, ‘da-DK’, ‘fi-FI’, ‘nb-NO’, ‘sv-SE’, ‘de-DE’} The text_keys which should be kept. Returns: Return type: None
-
classmethod
set_encoding
(encoding)¶ Hack sys.setdefaultencoding() to escape ASCII hell.
Parameters: encoding (str) – The name of the encoding to default to.
-
set_factors
(name, factormap, safe=False)¶ Apply numerical factors to (
single
-type categorical) variables.Factors can be read while aggregating descrp. stat.
qp.Views
.Parameters: - name (str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - factormap (dict) – A mapping of
{value: factor}
(int
toint
). - safe (bool, default False) – Set to
True
to prevent setting factors to thevalues
meta data of non-single
type variables.
Returns: Return type: None
- name (str) – The column variable name keyed in
-
set_item_texts
(name, renamed_items, text_key=None, axis_edit=None)¶ Rename or add item texts in the
items
objects ofmasks
.Parameters: - name (str) – The column variable name keyed in
_meta['masks']
. - renamed_items (dict) –
A dict mapping with following structure (array mask items are assumed to be passed by their order number):
>>> {1: 'new label for item #1', ... 5: 'new label for item #5'}
- text_key (str, default None) – Text key for text-based label information. Will automatically fall
back to the instance’s
text_key
property information if not provided. - axis_edit ({'x', 'y', ['x', 'y']}, default None) – If the
new_text
of the variable should only be considered temp. for build exports, the axes on that the edited text should appear can be provided.
Returns: The
DataSet
is modified inplace.Return type: None
- name (str) – The column variable name keyed in
-
set_missings
(var, missing_map='default', hide_on_y=True, ignore=None)¶ Flag category definitions for exclusion in aggregations.
Parameters: - var (str or list of str) – Variable(s) to apply the meta flags to.
- missing_map ('default' or list of codes or dict of {'flag': code(s)}, default 'default') – A mapping of codes to flags that can either be ‘exclude’ (globally ignored) or ‘d.exclude’ (only ignored in descriptive statistics). Codes provided in a list are flagged as ‘exclude’. Passing ‘default’ is using a preset list of (TODO: specify) values for exclusion.
- ignore (str or list of str, default None) – A list of variables that should be ignored when applying missing flags via the ‘default’ list method.
Returns: Return type: None
-
set_property
(name, prop_name, prop_value, ignore_items=False)¶ Access and set the value of a meta object’s
properties
collection.Parameters: - name (str) – The originating column variable name keyed in
meta['columns']
ormeta['masks']
. - prop_name (str) – The property key name.
- prop_value (any) – The value to be set for the property. Must be of valid type and have allowed values(s) with regard to the property.
- ignore_items (bool, default False) – When
name
refers to a variable from the'masks'
collection, setting to True will ignore anyitems
and only apply the property to themask
itself.
Returns: Return type: None
- name (str) – The originating column variable name keyed in
-
set_text_key
(text_key)¶ Set the default text_key of the
DataSet
.Note
A lot of the instance methods will fall back to the default text key in
_meta['lib']['default text']
. It is therefore important to use this method with caution, i.e. ensure that the meta containstext
entries for thetext_key
set.Parameters: text_key ({'en-GB', 'da-DK', 'fi-FI', 'nb-NO', 'sv-SE', 'de-DE'}) – The text key that will be set in _meta['lib']['default text']
.Returns: Return type: None
-
set_value_texts
(name, renamed_vals, text_key=None, axis_edit=None)¶ Rename or add value texts in the ‘values’ object.
This method works for array masks and column meta data.
Parameters: - name (str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - renamed_vals (dict) – A dict mapping with following structure:
{1: 'new label for code=1', 5: 'new label for code=5'}
Codes will be ignored if they do not exist in the ‘values’ object. - text_key (str, default None) – Text key for text-based label information. Will automatically fall
back to the instance’s
text_key
property information if not provided. - axis_edit ({'x', 'y', ['x', 'y']}, default None) – If
renamed_vals
should only be considered temp. for build exports, the axes on that the edited text should appear can be provided.
Returns: The
DataSet
is modified inplace.Return type: None
- name (str) – The column variable name keyed in
-
set_variable_text
(name, new_text, text_key=None, axis_edit=None)¶ Apply a new or update a column’s/masks’ meta text object.
Parameters: - name (str) – The originating column variable name keyed in
meta['columns']
ormeta['masks']
. - new_text (str) – The
text
(label) to be set. - text_key (str, default None) – Text key for text-based label information. Will automatically fall back to the instance’s text_key property information if not provided.
- axis_edit ({'x', 'y', ['x', 'y']}, default None) – If the
new_text
of the variable should only be considered temp. for build exports, the axes on that the edited text should appear can be provided.
Returns: The
DataSet
is modified inplace.Return type: None
- name (str) – The originating column variable name keyed in
-
set_verbose_errmsg
(verbose=True)¶
-
set_verbose_infomsg
(verbose=True)¶
-
slicing
(name, slicer, axis='y')¶ Set or update
rules[axis]['slicex']
meta for the named column.Quantipy builds will respect the kept codes and show them exclusively in results.
Note
This is not a replacement for
DataSet.set_missings()
as missing values are respected also in computations.Parameters: - name (str or list of str) – The column variable(s) name keyed in
_meta['columns']
. - slice (int or list of int) – Values indicated by their
int
codes will be shown inQuantipy.View.dataframe
s, respecting the provided order. - axis ({'x', 'y'}, default 'y') – The axis to slice the values on.
Returns: Return type: None
- name (str or list of str) – The column variable(s) name keyed in
-
sorting
(name, on='@', within=False, between=False, fix=None, ascending=False, sort_by_weight='auto')¶ Set or update
rules['x']['sortx']
meta for the named column.Parameters: - name (str or list of str) – The column variable(s) name keyed in
_meta['columns']
. - within (bool, default True) – Applies only to variables that have been aggregated by creating a
an
expand
grouping / overcode-styleView
: If True, will sort frequencies inside each group. - between (bool, default True) – Applies only to variables that have been aggregated by creating a
an
expand
grouping / overcode-styleView
: If True, will sort group and regular code frequencies with regard to each other. - fix (int or list of int, default None) – Values indicated by their
int
codes will be ignored in the sorting operation. - ascending (bool, default False) – By default frequencies are sorted in descending order. Specify
True
to sort ascending.
Returns: Return type: None
- name (str or list of str) – The column variable(s) name keyed in
-
sources
(name)¶ Get the
_meta['columns']
elements for the passed array mask name.Parameters: name (str) – The mask variable name keyed in _meta['masks']
.Returns: sources – The list of source elements from the array definition. Return type: list
-
split
(save=False)¶ Return the
meta
anddata
components of the DataSet instance.Parameters: save (bool, default False) – If True, the meta
anddata
objects will be saved to disk, using the instance’sname
andpath
attributes to determine the file location.Returns: meta, data – The meta dict and the case data DataFrame as separate objects. Return type: dict, pandas.DataFrame
-
static
start_meta
(text_key='main')¶ Starts a new/empty Quantipy meta document.
Parameters: text_key (str, default None) – The default text key to be set into the new meta document. Returns: meta – Quantipy meta object Return type: dict
-
subset
(variables=None, from_set=None, inplace=False)¶ Create a cloned version of self with a reduced collection of variables.
Parameters: - variables (str or list of str, default None) – A list of variable names to include in the new DataSet instance.
- from_set (str) – The name of an already existing set to base the new DataSet on.
Returns: subset_ds – The new reduced version of the DataSet.
Return type: qp.DataSet
-
take
(condition)¶ Create an index slicer to select rows from the DataFrame component.
Parameters: condition (Quantipy logic expression) – A logical condition expressed as Quantipy logic that determines which subset of the case data rows to be kept. Returns: slicer – The indices fulfilling the passed logical condition. Return type: pandas.Index
-
text
(name, shorten=True, text_key=None, axis_edit=None)¶ Return the variables text label information.
Parameters: - name (str, default None) – The variable name keyed in
_meta['columns']
or_meta['masks']
. - shorten (bool, default True) – If True,
text
label meta from array items will not report the parent mask’stext
. Setting it to False will show the “full” label. - text_key (str, default None) – The default text key to be set into the new meta document.
- axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns: text – The text metadata.
Return type: str
- name (str, default None) – The variable name keyed in
-
to_array
(name, variables, label, safe=True)¶ Combines column variables with same
values
meta into an array.Parameters: - name (str) – Name of new grid.
- variables (list of str or list of dicts) – Variable names that become items of the array. New item labels can be added as dict. Example: variables = [‘q1_1’, {‘q1_2’: ‘shop 2’}, {‘q1_3’: ‘shop 3’}]
- label (str) – Text label for the mask itself.
- safe (bool, default True) – If True, the method will raise a
ValueError
if the provided variable name is already present in self. SelectFalse
to forcefully overwrite an existing variable with the same name (independent of its type).
Returns: Return type: None
-
to_delimited_set
(name, label, variables, from_dichotomous=True, codes_from_name=True)¶ Combines multiple single variables to new delimited set variable.
Parameters: - name (str) – Name of new delimited set
- label (str) – Label text for the new delimited set.
- variables (list of str or list of tuples) – variables that get combined into the new delimited set. If they are dichotomous (from_dichotomous=True), the labels of the variables are used as category texts or if tuples are included, the second items will be used for the category texts. If the variables are categorical (from_dichotomous=False) the values of the variables need to be eqaul and are taken for the delimited set.
- from_dichotomous (bool, default True) – Define if the input variables are dichotomous or categorical.
- codes_from_name (bool, default True) – If from_dichotomous=True, the codes can be taken from the Variable names, if they are in form of ‘q01_1’, ‘q01_3’, … In this case the codes will be 1, 3, ….
Returns: Return type: None
-
transpose
(name, new_name=None, ignore_items=None, ignore_values=None, copy_data=True, text_key=None, overwrite=False)¶ Create a new array mask with transposed items / values structure.
This method will automatically create meta and case data additions in the
DataSet
instance.Parameters: - name (str) – The originating mask variable name keyed in
meta['masks']
. - new_name (str, default None) – The name of the new mask. If not provided explicitly, the new_name
will be constructed constructed by suffixing the original
name
with ‘_trans’, e.g.'Q2Array_trans
. - ignore_items (int or list of int, default None) – If provided, the items listed by their order number in the
_meta['masks'][name]['items']
object will not be part of the transposed array. This means they will be ignored while creating the new value codes meta. - ignore_codes (int or list of int, default None) – If provided, the listed code values will not be part of the transposed array. This means they will not be part of the new item meta.
- text_key (str) – The text key to be used when generating text objects, i.e. item and value labels.
- overwrite (bool, default False) – Overwrite variable if new_name is already included.
Returns: DataSet is modified inplace.
Return type: None
- name (str) – The originating mask variable name keyed in
-
unbind
(name)¶ Remove mask-structure for arrays
-
uncode
(target, mapper, default=None, intersect=None, inplace=True)¶ Create a new or copied series from data, recoded using a mapper.
Parameters: - target (str) – The variable name that is the target of the uncode. If it is keyed
in
_meta['masks']
the uncode is done for all mask items. If not found in_meta
this will fail with an error. - mapper (dict) – A mapper of {key: logic} entries.
- default (str, default None) – The column name to default to in cases where unattended lists are given in your logic, where an auto-transformation of {key: list} to {key: {default: list}} is provided. Note that lists in logical statements are themselves a form of shorthand and this will ultimately be interpreted as: {key: {default: has_any(list)}}.
- intersect (logical statement, default None) – If a logical statement is given here then it will be used as an implied intersection of all logical conditions given in the mapper.
- inplace (bool, default True) – If True, the
DataSet
will be modified inplace with new/updated columns. Will return a new recodedpandas.Series
instance if False.
Returns: Either the
DataSet._data
is modified inplace or a newpandas.Series
is returned.Return type: None or uncode_series
- target (str) – The variable name that is the target of the uncode. If it is keyed
in
-
undimensionize
(names=None, mapper_to_meta=False)¶ Rename the dataset columns to remove Dimensions compatibility.
-
undimensionizing_mapper
(names=None)¶ Return a renaming dataset mapper for un-dimensionizing names.
Parameters: None – Returns: mapper – A renaming mapper in the form of a dict of {old: new} that maps Dimensions naming conventions to non-Dimensions naming conventions. Return type: dict
-
unify_values
(name, code_map, slicer=None, exclusive=False)¶ Use a mapping of old to new codes to replace code values in
_data
.Note
Experimental! Check results carefully!
Parameters: - name (str) – The column variable name keyed in
meta['columns']
. - code_map (dict) – A mapping of
{old: new}
;old
andnew
must be the int-type code values from the column meta data. - slicer (Quantipy logic statement, default None) – If provided, the values will only be unified for cases where the condition holds.
- exclusive (bool, default False) – If True, the recoded unified value will replace whatever is already
found in the
_data
column, ignoringdelimited set
typed data to which normally would get appended to.
Returns: Return type: None
- name (str) – The column variable name keyed in
-
unroll
(varlist, keep=None, both=None)¶ Replace mask with their items, optionally excluding/keeping certain ones.
Parameters: - varlist (list) – A list of meta
'columns'
and/or'masks'
names. - keep (str or list, default None) – The names of masks that will not be replaced with their items.
- both ('all', str or list of str, default None) – The names of masks that will be included both as themselves and as collections of their items.
Note
varlist can also contain nesting var1 > var2. The variables which are included in the nesting can also be controlled by keep and both, even if the variables are also included as a “normal” variable.
- Example::
>>> ds.unroll(varlist = ['q1', 'q1 > gender'], both='all') ['q1', 'q1_1', 'q1_2', 'q1 > gender', 'q1_1 > gender', 'q1_2 > gender']
Returns: unrolled – The modified varlist
.Return type: list - varlist (list) – A list of meta
-
update
(data, on='identity', text_properties=None)¶ Update the
DataSet
with the case data entries found indata
.Parameters: - data (
pandas.DataFrame
) – A dataframe that contains a subset of columns from theDataSet
case data component. - on (str, default 'identity') – The column to use as a join key.
- text_properties (str/ list of str, default=None, {'all', [var_names]}) – Controls the update of the dataset_left properties with properties from the dataset_right. If None, properties from dataset_left will be updated by the ones from the dataset_right. If ‘all’, properties from dataset_left will be kept unchanged. Otherwise, specify the list of properties which will be kept unchanged in the dataset_left; all others will be updated by the properties from dataset_right.
Returns: DataSet is modified inplace.
Return type: None
- data (
-
used_text_keys
()¶ Get a list of all used textkeys in the dataset instance.
-
validate
(spss_limits=False, verbose=True)¶ Identify and report inconsistencies in the
DataSet
instance.- name:
- column/mask name and
meta[collection][var]['name']
are not identical - q_label:
- text object is badly formatted or has empty text mapping
- values:
- categorical variable does not contain values, value text is badly formatted or has empty text mapping
- text_keys:
- dataset.text_key is not included or existing text keys are not consistent (also for parents)
- source:
- parents or items do not exist
- codes:
- codes in data component are not included in meta component
- spss limit name:
- length of name is greater than spss limit (64 characters) (only shown if spss_limits=True)
- spss limit q_label:
- length of q_label is greater than spss limit (256 characters) (only shown if spss_limits=True)
- spss limit values:
- length of any value text is greater than spss limit (120 characters) (only shown if spss_limits=True)
-
value_texts
(name, text_key=None, axis_edit=None)¶ Get categorical data’s text information.
Parameters: - name (str) – The column variable name keyed in
_meta['columns']
. - text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
- axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns: texts – The list of category texts.
Return type: list
- name (str) – The column variable name keyed in
-
values
(name, text_key=None, axis_edit=None)¶ Get categorical data’s paired code and texts information from the meta.
Parameters: - name (str) – The column variable name keyed in
_meta['columns']
or_meta['masks']
. - text_key (str, default None) – The text_key that should be used when taking labels from the source meta.
- axis_edit ({'x', 'y'}, default None) – If provided the text_key is taken from the x/y edits dict.
Returns: values – The list of the numerical category codes and their
texts
packed as tuples.Return type: list of tuples
- name (str) – The column variable name keyed in
-
variables
(setname='data file', numeric=True, string=True, date=True, boolean=True, blacklist=None)¶ View all DataSet variables listed in their global order.
Parameters: - setname (str, default 'data file') – The name of the variable set to query. Defaults to the main variable collection stored via ‘data file’.
- numeric (bool, default True) – Include
int
andfloat
type variables? - string (bool, default True) – Include
string
type variables? - date (bool, default True) – Include
date
type variables? - boolean (bool, default True) – Include
boolean
type variables? - blacklist (list, default None) – A list of variables names to exclude from the variable listing.
Returns: varlist – The list of variables registered in the queried
set
.Return type: list
-
vmerge
(dataset, on=None, left_on=None, right_on=None, row_id_name=None, left_id=None, right_id=None, row_ids=None, overwrite_text=False, from_set=None, uniquify_key=None, reset_index=True, inplace=True, text_properties=None, verbose=True)¶ Merge Quantipy datasets together by appending rows.
This function merges two Quantipy datasets together, updating variables that exist in the left dataset and appending others. New variables will be appended in the order indicated by the ‘data file’ set if found, otherwise they will be appended in alphanumeric order. This merge happens vertically (row-wise).
Parameters: - dataset ((A list of multiple)
quantipy.DataSet
) – One or multiple datasets to merge into the currentDataSet
. - on (str, default=None) – The column to use to identify unique rows in both datasets.
- left_on (str, default=None) – The column to use to identify unique in the left dataset.
- right_on (str, default=None) – The column to use to identify unique in the right dataset.
- row_id_name (str, default=None) – The named column will be filled with the ids indicated for each dataset, as per left_id/right_id/row_ids. If meta for the named column doesn’t already exist a new column definition will be added and assigned a reductive-appropriate type.
- left_id (str/int/float, default=None) – Where the row_id_name column is not already populated for the dataset_left, this value will be populated.
- right_id (str/int/float, default=None) – Where the row_id_name column is not already populated for the dataset_right, this value will be populated.
- row_ids (list of str/int/float, default=None) – When datasets has been used, this list provides the row ids that will be populated in the row_id_name column for each of those datasets, respectively.
- overwrite_text (bool, default=False) – If True, text_keys in the left meta that also exist in right meta will be overwritten instead of ignored.
- from_set (str, default=None) – Use a set defined in the right meta to control which columns are merged from the right dataset.
- uniquify_key (str, default None) – A int-like column name found in all the passed
DataSet
objects that will be protected from having duplicates. The original version of the column will be kept under its name prefixed with ‘original’. - reset_index (bool, default=True) – If True pandas.DataFrame.reindex() will be applied to the merged dataframe.
- inplace (bool, default True) – If True, the
DataSet
will be modified inplace with new/updated rows. Will return a newDataSet
instance if False. - merge_existing (str/ list of str, default None, {'all', [var_names]}) – Merge values for defined delimited sets if it exists in both datasets. (update_existing is prioritized)
- text_properties (str/ list of str, default=None, {'all', [var_names]}) – Controls the update of the dataset_left properties with properties from the dataset_right. If None, properties from dataset_left will be updated by the ones from the dataset_right. If ‘all’, properties from dataset_left will be kept unchanged. Otherwise, specify the list of properties which will be kept unchanged in the dataset_left; all others will be updated by the properties from dataset_right.
- verbose (bool, default=True) – Echo progress feedback to the output pane.
Returns: None or new_dataset – If the merge is not applied
inplace
, aDataSet
instance is returned.Return type: quantipy.DataSet
- dataset ((A list of multiple)
-
weight
(weight_scheme, weight_name='weight', unique_key='identity', subset=None, report=True, path_report=None, inplace=True, verbose=True)¶ Weight the
DataSet
according to a well-defined weight scheme.Parameters: - weight_scheme (quantipy.Rim instance) – A rim weights setup with defined targets. Can include multiple weight groups and/or filters.
- weight_name (str, default 'weight') – A name for the float variable that is added to pick up the weight factors.
- unique_key (str, default 'identity'.) – A variable inside the
DataSet
instance that will be used to the map individual case weights to their matching rows. - subset (Quantipy complex logic expression) – A logic to filter the DataSet, weighting only the remaining subset.
- report (bool, default True) – If True, will report a summary of the weight algorithm run and factor outcomes.
- path_report (str, default None) – A file path to save an .xlsx version of the weight report to.
- inplace (bool, default True) – If True, the weight factors are merged back into the
DataSet
instance. Will otherwise return thepandas.DataFrame
that contains the weight factors, theunique_key
and all variables that have been used to compute the weights (filters, target variables, etc.).
Returns: Will either create a new column called
'weight'
in theDataSet
instance or return aDataFrame
that contains the weight factors.Return type: None or
pandas.DataFrame
-
write_dimensions
(path_mdd=None, path_ddf=None, text_key=None, run=True, clean_up=True, CRLF='CR')¶ Build Dimensions/SPSS Base Professional .ddf/.mdd data pairs.
Note
SPSS Data Collection Base Professional must be installed on the machine. The method is creating .mrs and .dms scripts which are executed through the software’s API.
Parameters: - path_mdd (str, default None) – The full path (optionally with extension
'.mdd'
, otherwise assumed as such) for the saved the DataSet._meta component. If not provided, the instance’sname
and`path
attributes will be used to determine the file location. - path_ddf (str, default None) – The full path (optionally with extension
'.ddf'
, otherwise assumed as such) for the saved DataSet._data component. If not provided, the instance’sname
and`path
attributes will be used to determine the file location. - text_key (str, default None) – The desired
text_key
for alltext
label information. Uses theDataSet.text_key
information if not provided. - run (bool, default True) – If True, the method will try to run the metadata creating .mrs script and execute a DMSRun for the case data transformation in the .dms file.
- clean_up (bool, default True) – By default, all helper files from the conversion (.dms, .mrs, paired .csv files, etc.) will be deleted after the process has finished.
Returns: Return type: A .ddf/.mdd pair is saved at the provided path location.
- path_mdd (str, default None) – The full path (optionally with extension
-
write_quantipy
(path_meta=None, path_data=None)¶ Write the data and meta components to .csv/.json files.
The resulting files are well-defined native Quantipy source files.
Parameters: - path_meta (str, default None) – The full path (optionally with extension
'.json'
, otherwise assumed as such) for the saved the DataSet._meta component. If not provided, the instance’sname
and`path
attributes will be used to determine the file location. - path_data (str, default None) – The full path (optionally with extension
'.csv'
, otherwise assumed as such) for the saved DataSet._data component. If not provided, the instance’sname
and`path
attributes will be used to determine the file location.
Returns: Return type: A .csv/.json pair is saved at the provided path location.
- path_meta (str, default None) – The full path (optionally with extension
-
write_spss
(path_sav=None, index=True, text_key=None, mrset_tag_style='__', drop_delimited=True, from_set=None, verbose=True)¶ Convert the Quantipy DataSet into a SPSS .sav data file.
Parameters: - path_sav (str, default None) – The full path (optionally with extension
'.json'
, otherwise assumed as such) for the saved the DataSet._meta component. If not provided, the instance’sname
and`path
attributes will be used to determine the file location. - index (bool, default False) – Should the index be inserted into the dataframe before the conversion happens?
- text_key (str, default None) – The text_key that should be used when taking labels from the
source meta. If the given text_key is not found for any
particular text object, the
DataSet.text_key
will be used instead. - mrset_tag_style (str, default '__') – The delimiting character/string to use when naming dichotomous set variables. The mrset_tag_style will appear between the name of the variable and the dichotomous variable’s value name, as taken from the delimited set value that dichotomous variable represents.
- drop_delimited (bool, default True) – Should Quantipy’s delimited set variables be dropped from the export after being converted to dichotomous sets/mrsets?
- from_set (str) – The set name from which the export should be drawn.
Returns: Return type: A SPSS .sav file is saved at the provided path location.
- path_sav (str, default None) – The full path (optionally with extension
-