psiz.data
Module of convenience data classes.
- class psiz.data.Categorize(stimulus_set=None, objective_query_label=None, name=None)
Content for categorization judgments.
- export(export_format='tfds', with_timestep_axis=None)
Prepare trial content data for dataset.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- property is_actual
Return 2D Boolean array indicating trials with actual content.
- class psiz.data.Content
Abstract class for trial content data.
- abstract property is_actual
Return 2D Boolean array indicating trials with actual content.
- Returns
shape=(samples, sequence_length)
- Return type
is_actual
- property mask_value
Getter method for mask_value.
- property mask_zero
Getter method for mask_zero.
- property unique_configurations
Generate a unique ID for each content configuration.
Convenience method that generates a unique ID for each unique content configuration.
Will call subclass _config_attrs in order to determine unique configuraitons. It is assumed that all return attributes have shape=(samples, sequence_length)
- Returns
- A unique index for each type of trial
configuration. shape=(samples, sequence_length)
- df_config: A DataFrame containing all the unique
trial configurations.
- Return type
config_idx
- class psiz.data.Continuous(value, **kwargs)
A continuous outcome.
- export(export_format='tfds', with_timestep_axis=None)
Return appropriately formatted data.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- class psiz.data.Dataset(components)
Generic composite class for data.
- property components
Return all trial components.
- export(export_format='tfds', with_timestep_axis=None)
Export trial data as model-consumable object.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, dataset is exported with a timestep axis if any of the provided DataComponents were initialized with a timestep axis. Callers can overide default behavior by setting this argument.
- Returns
A dataset that can be consumed by a model.
- Return type
ds
- class psiz.data.DatasetComponent
Abstract class for dataset component.
- abstract export(export_format='tfds', with_timestep_axis=None)
Return appropriately formatted data.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- class psiz.data.Group(value, name=None)
Base class for group membership data.
- export(export_format='tfds', with_timestep_axis=None)
Export.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- class psiz.data.Outcome(name=None, sample_weight=None)
Base class for outcome data.
- export(export_format='tfds', with_timestep_axis=None)
Export sample_weight.
Subclasses of Outcome must call super().export(…) in order to obtain sample weights.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- process_sample_weight()
Process sample_weight.
NOTE: Objects that subclass Outcome must call this method in the __init__ method after setting n_sample and sequence_length.
- property sample_weight
Return sample weight.
- class psiz.data.Rank(stimulus_set, n_select=None, name=None)
Content for ranked similarity judgments.
- static as_sparse_outcome(n_reference, selection_indices)
Convert from selection indices to an outcome index.
- Parameters
n_reference – Integer indicating the number of references.
selection_indices – Array-like of integers indicating the stimulus indices that were selected. The order of indices is assumed to correspond to the order that the selections were made.
- Returns
An integer representing a sparse encoding of the outcome.
- export(export_format='tfds', with_timestep_axis=None)
Prepare trial content data for dataset.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- property is_actual
Return 2D Boolean array indicating trials with actual content.
- property n_outcome
Getter method for n_outcome.
- Returns
The number of outcomes for a trial.
- static possible_outcomes(n_reference, n_select)
Return the possible outcomes of a ranked trial.
- Parameters
n_reference – Integer indicating number of references
n_select – Integer indicating number of ranked selections.
- Returns
An 2D array indicating all possible outcomes where the values indicate indices of the reference stimuli. Each row corresponds to one outcome. Note the indices refer to references only and does not include an index for the query. Also note that the unpermuted index is returned first.
- class psiz.data.Rate(stimulus_set, name=None)
Trial content requiring similarity ratings.
- export(export_format='tfds', with_timestep_axis=None)
Prepare trial content data for dataset.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- property is_actual
Return 2D Boolean array indicating trials with actual content.
- class psiz.data.SparseCategorical(index, depth=None, **kwargs)
A categorical outcome.
- export(export_format='tfds', with_timestep_axis=None)
Return appropriately formatted data.
- Parameters
export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.
with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.
- psiz.data.sample_qr_sets(query_idx, n_reference, n_sample, reference_probability, replace=True, rng=None)
Sample query-reference sets for a specific query.
For problems involving more than a few stimuli, it is infeasible to generate an exhaustive list of stimlus sets. As an alternative, references can be stochastically sampled based on user-provide probabilities.
- Parameters
query_idx – An integer indicating the query index.
n_reference – An integer indicating the number of references in each trial.
n_sample – Integer indicating the number of unique combinations to sample. If replace=False, then there is a limit on the number of unique samples; if n_sample is greater than number of unique k-combinations, an exhaustive list of all k-combinations is returned (which will be less than the requested number of samples).
reference_probability – An array of nonnegative values indicating the probability of selecting each stimulus as a reference given the query indicated by query_idx. It is assumed, but not checked, that all values are nonnegative. This array must include the probability of the query index so that index position semantics are preserved. The values do not need to sum to 1 since the array is normalized internally. shape=(n_stimuli,)
replace (optional) – Boolean indicating if the sampling is with or without replacement. The default is True, meaning that a particular k-combination can be sampled multiple times. If unique samples are desired, set replace=False.
rng (optional) – A NumPy random number generator that can be used to control stochasticity.
- Returns
- A set of query-reference samples.
shape=(n_samples, n_reference + 1)
- Return type
samples
- psiz.data.unravel_timestep(x)
Unravel sample and timestep axis into a single axis.
- Parameters
x – A time-step based data structure. shape=(samples, sequence_length, [m, n, …])
- Returns
- New data structure.
shape=(samples * sequence_length, [m, n, …])