DataSet components

Case and meta data

Quantipy builds upon the pandas library to feature the DataFrame and Series objects in the case data component of its DataSet object. Additionally, each DataSet offers a metadata component to describe the data columns and provide additional information on the characteristics of the underlying structure. The metadata document is implemented as a nested dict and provides the following keys on its first level:

element contains
'type' case data type
'info' info on the source data
'lib' shared use references
'columns' info on DataFrame columns (Quantipy types, labels, etc.)
'sets' ordered groups of variables pointing to other parts of the meta
'masks' complex variable type definitions (arrays, dichotomous, etc.)

columns and masks objects

There are two variable collections inside a Quantipy metadata document: 'columns' is storing the meta for each accompanying pandas.DataFrame column object, while 'masks' are building upon the regular 'columns' metadata but additionally employ special meta instructions to define complex data types. An example is the the 'array' type that (in MR speak) maps multiple “question” variables to one “answer” object.

“Simple”” data definitons that are supported by Quantipy can either be numeric 'float' and 'int' types, categorical 'single' and 'delimited set' variables or of type 'string', 'date' and 'time'.

Languages: text and text_key mappings

Throughout Quantipy metadata all label information, e.g. variable question texts and category descriptions, are stored in text objects that are mapping different language (or context) versions of a label to a specific text_key. That way the metadata can support multi-language and multi-purpose (for example detailed/extensive vs. short question texts) label information in a digestable format that is easy to query:

>>> meta['columns']['q1']['text']
{'de-DE': 'Das ist ein langes deutsches Label',
 u'en-GB': u'What is your main fitness activity?',
 'x edits': {'de-DE': 'German build label', 'en-GB': 'English build label'}}

Valid text_key settings are:

text_key Language / context
'en-GB' English
'de-DE' German
'fr-FR' French
'da-DK' Danish
'sv-SV' Swedish
'nb-NO' Norwegian
'fi-FI' Finnish
'x edits' Build label edit for x-axis
'y edits' Build label edit for y-axis

Categorical values object

single and delimited set variables restrict the possible case data entries to a list of values that consist of numeric answer codes and their text labels, defining distinct categories:

>>> meta['columns']['q1']['values']
[{'value': 1,
  'text': {'en-GB': 'Dog'}
 },
 {'value': 2,
  'text': {'en-GB': 'Cat'}
 },
 {'value': 3,
  'text': {'en-GB': 'Bird'}
 },
 {'value': -9,
  'text': {'en-GB': 'Not an animal'}
 }]

The array type

Turning to the masks collection of the metadata, array variables group together a collection of variables that share a common response options scheme, i.e. different statements (usually referencing a broader topic) that are answered using the same scale. In the Quantipy metadata document, an array variable has a subtype that describes the type of the constructing source variables listed in the items object. In contrast to simple variable types, any categorical values metadata is stored inside the shared information collection lib, for access from both the columns and masks representation of array elements:

>>> meta['masks']['q5']
{u'items': [{u'source': u'columns@q5_1', u'text': {u'en-GB': u'Surfing'}},
  {u'source': u'columns@q5_2', u'text': {u'en-GB': u'Snowboarding'}},
  {u'source': u'columns@q5_3', u'text': {u'en-GB': u'Kite boarding'}},
  {u'source': u'columns@q5_4', u'text': {u'en-GB': u'Parachuting'}},
  {u'source': u'columns@q5_5', u'text': {u'en-GB': u'Cave diving'}},
  {u'source': u'columns@q5_6', u'text': {u'en-GB': u'Windsurfing'}}],
 u'name': u'q5',
 u'subtype': u'single',
 u'text': {u'en-GB': u'How likely are you to do each of the following in the next year?'},
 u'type': u'array',
 u'values': 'lib@values@q5'}
>>> meta['lib']['values']['q5']
[{u'text': {u'en-GB': u'I would refuse if asked'}, u'value': 1},
 {u'text': {u'en-GB': u'Very unlikely'}, u'value': 2},
 {u'text': {u'en-GB': u"Probably wouldn't"}, u'value': 3},
 {u'text': {u'en-GB': u'Probably would if asked'}, u'value': 4},
 {u'text': {u'en-GB': u'Very likely'}, u'value': 5},
 {u'text': {u'en-GB': u"I'm already planning to"}, u'value': 97},
 {u'text': {u'en-GB': u"Don't know"}, u'value': 98}]

Exploring the columns meta of an array item shows the same values reference pointer and informs about its parent meta structure, i.e. the array’s masks defintion:

>>> meta['columns']['q5_1']
{u'name': u'q5_1',
 u'parent': {u'masks@q5': {u'type': u'array'}},
 u'text': {u'en-GB': u'How likely are you to do each of the following in the next year? - Surfing'},
 u'type': u'single',
 u'values': u'lib@values@q5'}