DataSet components¶
Case and meta data¶
Quantipy
builds upon the pandas
library to feature the DataFrame
and Series
objects in the case data component of its DataSet
object.
Additionally, each DataSet
offers a metadata component to describe the
data columns and provide additional information on the characteristics of the
underlying structure. The metadata document is implemented as a nested dict
and provides the following keys
on its first level:
element | contains |
---|---|
'type' |
case data type |
'info' |
info on the source data |
'lib' |
shared use references |
'columns' |
info on DataFrame columns (Quantipy types, labels, etc.) |
'sets' |
ordered groups of variables pointing to other parts of the meta |
'masks' |
complex variable type definitions (arrays, dichotomous, etc.) |
columns
and masks
objects¶
There are two variable collections inside a Quantipy
metadata document:
'columns'
is storing the meta for each accompanying pandas.DataFrame
column object, while 'masks'
are building upon the regular 'columns'
metadata but additionally employ special meta instructions to define
complex data types. An example is the the 'array'
type that (in MR speak) maps
multiple “question” variables to one “answer” object.
“Simple”” data definitons that are supported by Quantipy
can either be numeric
'float'
and 'int'
types, categorical 'single'
and 'delimited set'
variables or of type 'string'
, 'date'
and 'time'
.
Languages: text
and text_key
mappings¶
Throughout Quantipy
metadata all label information, e.g. variable question
texts and category descriptions, are stored in text
objects that are mapping
different language (or context) versions of a label to a specific text_key
.
That way the metadata can support multi-language and multi-purpose (for example
detailed/extensive vs. short question texts) label information in a digestable
format that is easy to query:
>>> meta['columns']['q1']['text']
{'de-DE': 'Das ist ein langes deutsches Label',
u'en-GB': u'What is your main fitness activity?',
'x edits': {'de-DE': 'German build label', 'en-GB': 'English build label'}}
Valid text_key
settings are:
text_key |
Language / context |
---|---|
'en-GB' |
English |
'de-DE' |
German |
'fr-FR' |
French |
'da-DK' |
Danish |
'sv-SV' |
Swedish |
'nb-NO' |
Norwegian |
'fi-FI' |
Finnish |
'x edits' |
Build label edit for x-axis |
'y edits' |
Build label edit for y-axis |
Categorical values
object¶
single
and delimited set
variables restrict the possible case data
entries to a list of values
that consist of numeric answer codes and their
text
labels, defining distinct categories:
>>> meta['columns']['q1']['values']
[{'value': 1,
'text': {'en-GB': 'Dog'}
},
{'value': 2,
'text': {'en-GB': 'Cat'}
},
{'value': 3,
'text': {'en-GB': 'Bird'}
},
{'value': -9,
'text': {'en-GB': 'Not an animal'}
}]
The array
type¶
Turning to the masks
collection of the metadata, array
variables
group together a collection of variables that share a common response options
scheme, i.e. different statements (usually referencing a broader topic) that
are answered using the same scale. In the Quantipy
metadata document, an
array
variable has a subtype
that describes the type of the
constructing source variables listed in the items
object. In contrast to simple variable types, any
categorical values
metadata is stored inside the shared information collection
lib
, for access from both the columns
and masks
representation of
array
elements:
>>> meta['masks']['q5']
{u'items': [{u'source': u'columns@q5_1', u'text': {u'en-GB': u'Surfing'}},
{u'source': u'columns@q5_2', u'text': {u'en-GB': u'Snowboarding'}},
{u'source': u'columns@q5_3', u'text': {u'en-GB': u'Kite boarding'}},
{u'source': u'columns@q5_4', u'text': {u'en-GB': u'Parachuting'}},
{u'source': u'columns@q5_5', u'text': {u'en-GB': u'Cave diving'}},
{u'source': u'columns@q5_6', u'text': {u'en-GB': u'Windsurfing'}}],
u'name': u'q5',
u'subtype': u'single',
u'text': {u'en-GB': u'How likely are you to do each of the following in the next year?'},
u'type': u'array',
u'values': 'lib@values@q5'}
>>> meta['lib']['values']['q5']
[{u'text': {u'en-GB': u'I would refuse if asked'}, u'value': 1},
{u'text': {u'en-GB': u'Very unlikely'}, u'value': 2},
{u'text': {u'en-GB': u"Probably wouldn't"}, u'value': 3},
{u'text': {u'en-GB': u'Probably would if asked'}, u'value': 4},
{u'text': {u'en-GB': u'Very likely'}, u'value': 5},
{u'text': {u'en-GB': u"I'm already planning to"}, u'value': 97},
{u'text': {u'en-GB': u"Don't know"}, u'value': 98}]
Exploring the columns
meta of an array item shows the same values
reference pointer and informs about its parent
meta structure, i.e. the
array’s masks
defintion:
>>> meta['columns']['q5_1']
{u'name': u'q5_1',
u'parent': {u'masks@q5': {u'type': u'array'}},
u'text': {u'en-GB': u'How likely are you to do each of the following in the next year? - Surfing'},
u'type': u'single',
u'values': u'lib@values@q5'}