I/O¶
Starting from native components¶
Using a standalone pd.DataFrame
¶
Quantipy
can create a meta document from a inferring its variable types from
the dtypes
of a pd.DataFrame
. In that process, ìnt
, float
and
string
data types are created inside the meta component of the DataSet
.
In this basic form, text
label information is missing. For a example, given
a pd.DataFrame
as per:
>>> casedata = [[1000, 10, 1.2, 'text1'],
... [1001, 4, 3.4, 'jjda'],
... [1002, 8, np.NaN, 'what?'],
... [1003, 8, 7.81, '---' ],
... [1004, 5, 3.0, 'hello world!']]
>>> df = pd.DataFrame(casedata, columns=['identity', 'q1', 'q2', 'q3'])
>>> df
identity q1 q2 q3
0 1000 10 1.20 text1
1 1001 4 3.40 jjda
2 1002 8 NaN what?
3 1003 8 7.81 ---
4 1004 5 3.00 hello world!
… the conversion is adding matching metadata to the DataSet
instance:
>>> dataset = qp.DataSet(name='example', dimensions_comp=False)
>>> dataset.from_components(df)
Inferring meta data from pd.DataFrame.columns (4)...
identity: dtype: int64 - converted: int
q1: dtype: int64 - converted: int
q2: dtype: float64 - converted: float
q3: dtype: object - converted: string
>>> dataset.meta()['columns']['q2']
{'text': {'en-GB': ''}, 'type': 'float', 'name': 'q2', 'parent': {}, 'properties': {'created': True}}
.csv
/ .json
pairs¶
We can easily read in Quantipy
native data with the read_quantipy()
method and providing the paths to both the .csv
and .json
file (file
extensions are handled automatically), e.g.:
>>> folder = './Data/'
>>> file_name = 'Example Data (A)'
>>> path_csv = path_json = folder + file_name
>>> dataset = qp.DataSet(name='example', dimensions_comp=False)
>>> dataset.read_quantipy(path_json, path_csv)
DataSet: ./Data/example
rows: 8255 - columns: 76
Dimensions compatibility mode: False
We can that access the case and metadata components:
>>> dataset.data()['q4'].head()
0 1
1 2
2 2
3 1
4 1
Name: q4, dtype: int64
>>> meta = dataset.meta()['columns']['q4']
>>> json.dumps(meta)
{
"values": [
{
"text": {
"en-GB": "Yes"
},
"value": 1
},
{
"text": {
"en-GB": "No"
},
"value": 2
}
],
"text": {
"en-GB": "Do you ever participate in sports activities with people in your household?"
},
"type": "single",
"name": "q4",
"parent": {}
}
Third party conversions¶
Supported conversions¶
In adddition to providing plain .csv
/.json
data (pairs), source files
can be read into Quantipy using a number of I/O functions to deal with
standard file formats encountered in the market research industry:
Software | Format | Read | Write |
---|---|---|---|
SPSS Statistics | .sav | Yes | Yes |
SPSS Dimensions | .dff/.mdd | Yes | Yes |
Decipher | tab-delimited .json/ .txt | Yes | No |
Ascribe | tab-delimited .xml/ .txt | Yes | No |
The following functions are designed to convert the different file formats’ structures into inputs understood by Quantipy.
SPSS Statistics¶
Reading:
>>> from quantipy.core.tools.dp.io import read_spss
>>> meta, data = read_spss(path_sav)
Note
On a Windows machine you MUST use ioLocale=None
when reading
from SPSS. This means if you are using a Windows machine your base
example for reading from SPSS is
meta, data = read_spss(path_sav, ioLocale=None)
.
When reading from SPSS you have the opportunity to specify a custom
dichotomous values map, that will be used to convert all dichotomous
sets into Quantipy delimited sets, using the dichot
argument.
The entire read operation will use the same map on all dichotomous
sets so they must be applied uniformly throughout the SAV file. The
default map that will be used if none is provided will be
{'yes': 1, 'no': 0}
.
>>> meta, data = read_spss(path_sav, dichot={'yes': 1, 'no': 2})
SPSS dates will be converted to pandas dates by default but
if this results in conversion issues or failures you can read
the dates in as Quantipy strings to deal with them later, using the
dates_as_strings
argument.
>>> meta, data = read_spss(path_sav, dates_as_strings=True)
Writing:
>>> from quantipy.core.tools.dp.io import write_spss
>>> write_spss(path_sav, meta, data)
By default SPSS files will be generated from the 'data file'
set found in meta['sets']
, but a custom set can be named instead
using the from_set
argument.
>>> write_spss(path_sav_analysis, meta, data, from_set='sav-export')
The custom set must be well-formed:
>>> "sets" : {
... "sav-export": {
... "items": [
... "columns@Q1",
... "columns@Q2",
... "columns@Q3",
... ...
... ]
... }
... }
Dimensions¶
Reading:
>>> from quantipy.core.tools.dp.io import read_dimensions
>>> meta, data = read_dimensions(path_mdd, path_ddf)
Decipher¶
Reading:
>>> from quantipy.core.tools.dp.io import read_decipher
>>> meta, data = read_decipher(path_json, path_txt)
Ascribe¶
Reading:
>>> from quantipy.core.tools.dp.io import read_ascribe
>>> meta, data = read_ascribe(path_xml, path_txt)