DataSet Dimensions compatibility

DTO-downloaded and Dimensions converted variable naming conventions are following specific rules for array names and corresponding ìtems. DataSet offers a compatibility mode for Dimensions scenarios and handles the proper renaming automatically. Here is what you should know…

The compatibility mode

A DataSet will (by default) support Dimensions-like array naming for its connected data files when constructed. An array masks meta defintition of a variable called q5 looking like this…:

{u'items': [{u'source': u'columns@q5_1', u'text': {u'en-GB': u'Surfing'}},
  {u'source': u'columns@q5_2', u'text': {u'en-GB': u'Snowboarding'}},
  {u'source': u'columns@q5_3', u'text': {u'en-GB': u'Kite boarding'}},
  {u'source': u'columns@q5_4', u'text': {u'en-GB': u'Parachuting'}},
  {u'source': u'columns@q5_5', u'text': {u'en-GB': u'Cave diving'}},
  {u'source': u'columns@q5_6', u'text': {u'en-GB': u'Windsurfing'}}],
 u'subtype': u'single',
 u'text': {u'en-GB': u'How likely are you to do each of the following in the next year?'},
 u'type': u'array',
 u'values': u'lib@values@q5'}

…will be converted into its “Dimensions equivalent” as per:

>>> dataset = qp.DataSet(name_data, dimensions_comp=True)
>>> dataset.read_quantipy(path_data+name_data, path_data+name_data)
DataSet: ../Data/Quantipy/Example Data (A)
rows: 8255 - columns: 75
Dimensions compatibilty mode: True
>>> dataset.masks()
['q5.q5_grid', 'q6.q6_grid', 'q7.q7_grid']
>>> dataset._meta['masks']['q5.q5_grid']
{u'items': [{u'source': 'columns@q5[{q5_1}].q5_grid',
   u'text': {u'en-GB': u'Surfing'}},
  {u'source': 'columns@q5[{q5_2}].q5_grid',
   u'text': {u'en-GB': u'Snowboarding'}},
  {u'source': 'columns@q5[{q5_3}].q5_grid',
   u'text': {u'en-GB': u'Kite boarding'}},
  {u'source': 'columns@q5[{q5_4}].q5_grid',
   u'text': {u'en-GB': u'Parachuting'}},
  {u'source': 'columns@q5[{q5_5}].q5_grid',
   u'text': {u'en-GB': u'Cave diving'}},
  {u'source': 'columns@q5[{q5_6}].q5_grid',
   u'text': {u'en-GB': u'Windsurfing'}}],
 'name': 'q5.q5_grid',
 u'subtype': u'single',
 u'text': {u'en-GB': u'How likely are you to do each of the following in the next year?'},
 u'type': u'array',
 u'values': 'lib@values@q5.q5_grid'}

Accessing and creating array data

Since new names are converted automatically by DataSet methods, there is no need to write down the full (DTO-like) Dimensions array name when adding new metadata. However, querying variables is always requiring the proper name:

>>> name, qtype, label = 'array_var', 'single', 'ARRAY LABEL'
>>> cats =  ['A', 'B', 'C']
>>> items = ['1', '2', '3']
>>> dataset.add_meta(name, qtype, label, cats, items)
>>> dataset.masks()
['q5.q5_grid', 'array_var.array_var_grid', 'q6.q6_grid', 'q7.q7_grid']
>>> dataset.meta('array_var.array_var_grid')
single                                                                   items item texts  codes texts missing
array_var.array_var_grid: ARRAY LABEL
1                                      array_var[{array_var_1}].array_var_grid          1      1     A    None
2                                      array_var[{array_var_2}].array_var_grid          2      2     B    None
3                                      array_var[{array_var_3}].array_var_grid          3      3     C    None
>>> dataset['array_var.array_var_grid'].head(5)
   array_var[{array_var_1}].array_var_grid  array_var[{array_var_2}].array_var_grid  array_var[{array_var_3}].array_var_grid
0                                      NaN                                      NaN                                      NaN
1                                      NaN                                      NaN                                      NaN
2                                      NaN                                      NaN                                      NaN
3                                      NaN                                      NaN                                      NaN
4                                      NaN                                      NaN                                      NaN

As can been seen above, both the masks name as well as the array item elements are being properly converted to match DTO/Dimensions conventions.

When using rename(), copy() or transpose(), the same behaviour applies:

>>> dataset.rename('q6.q6_grid', 'q6new')
>>> dataset.masks()
['q5.q5_grid', 'array_var.array_var_grid', 'q6new.q6new_grid', 'q7.q7_grid']
>>> dataset.copy('q6new.q6new_grid', suffix='q6copy')
>>> dataset.masks()
['q5.q5_grid', 'q6new_q6copy.q6new_q6copy_grid', 'array_var.array_var_grid', 'q6new.q6new_grid', 'q7.q7_grid']
>>> dataset.transpose('q6new_q6copy.q6new_q6copy_grid')
>>> dataset.masks()
['q5.q5_grid', 'q6new_q6copy_trans.q6new_q6copy_trans_grid', 'q6new_q6copy.q6new_q6copy_grid', 'array_var.array_var_grid', 'q6new.q6new_grid', 'q7.q7_grid']