psiz.data

Module of convenience data classes.

class psiz.data.Categorize(stimulus_set=None, objective_query_label=None, name=None)

Content for categorization judgments.

export(export_format='tfds', with_timestep_axis=None)

Prepare trial content data for dataset.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

property is_actual

Return 2D Boolean array indicating trials with actual content.

class psiz.data.Content

Abstract class for trial content data.

abstract property is_actual

Return 2D Boolean array indicating trials with actual content.

Returns

shape=(samples, sequence_length)

Return type

is_actual

property mask_value

Getter method for mask_value.

property mask_zero

Getter method for mask_zero.

property unique_configurations

Generate a unique ID for each content configuration.

Convenience method that generates a unique ID for each unique content configuration.

Will call subclass _config_attrs in order to determine unique configuraitons. It is assumed that all return attributes have shape=(samples, sequence_length)

Returns

A unique index for each type of trial

configuration. shape=(samples, sequence_length)

df_config: A DataFrame containing all the unique

trial configurations.

Return type

config_idx

class psiz.data.Continuous(value, **kwargs)

A continuous outcome.

export(export_format='tfds', with_timestep_axis=None)

Return appropriately formatted data.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

class psiz.data.Dataset(components)

Generic composite class for data.

property components

Return all trial components.

export(export_format='tfds', with_timestep_axis=None)

Export trial data as model-consumable object.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, dataset is exported with a timestep axis if any of the provided DataComponents were initialized with a timestep axis. Callers can overide default behavior by setting this argument.

Returns

A dataset that can be consumed by a model.

Return type

ds

class psiz.data.DatasetComponent

Abstract class for dataset component.

abstract export(export_format='tfds', with_timestep_axis=None)

Return appropriately formatted data.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

class psiz.data.Group(value, name=None)

Base class for group membership data.

export(export_format='tfds', with_timestep_axis=None)

Export.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

class psiz.data.Outcome(name=None, sample_weight=None)

Base class for outcome data.

export(export_format='tfds', with_timestep_axis=None)

Export sample_weight.

Subclasses of Outcome must call super().export(…) in order to obtain sample weights.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

process_sample_weight()

Process sample_weight.

NOTE: Objects that subclass Outcome must call this method in the __init__ method after setting n_sample and sequence_length.

property sample_weight

Return sample weight.

class psiz.data.Rank(stimulus_set, n_select=None, name=None)

Content for ranked similarity judgments.

static as_sparse_outcome(n_reference, selection_indices)

Convert from selection indices to an outcome index.

Parameters
  • n_reference – Integer indicating the number of references.

  • selection_indices – Array-like of integers indicating the stimulus indices that were selected. The order of indices is assumed to correspond to the order that the selections were made.

Returns

An integer representing a sparse encoding of the outcome.

export(export_format='tfds', with_timestep_axis=None)

Prepare trial content data for dataset.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

property is_actual

Return 2D Boolean array indicating trials with actual content.

property n_outcome

Getter method for n_outcome.

Returns

The number of outcomes for a trial.

static possible_outcomes(n_reference, n_select)

Return the possible outcomes of a ranked trial.

Parameters
  • n_reference – Integer indicating number of references

  • n_select – Integer indicating number of ranked selections.

Returns

An 2D array indicating all possible outcomes where the values indicate indices of the reference stimuli. Each row corresponds to one outcome. Note the indices refer to references only and does not include an index for the query. Also note that the unpermuted index is returned first.

class psiz.data.Rate(stimulus_set, name=None)

Trial content requiring similarity ratings.

export(export_format='tfds', with_timestep_axis=None)

Prepare trial content data for dataset.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

property is_actual

Return 2D Boolean array indicating trials with actual content.

class psiz.data.SparseCategorical(index, depth=None, **kwargs)

A categorical outcome.

export(export_format='tfds', with_timestep_axis=None)

Return appropriately formatted data.

Parameters
  • export_format (optional) – The output format of the dataset. By default the dataset is formatted as a tf.data.Dataset object.

  • with_timestep_axis (optional) – Boolean indicating if data should be returned with a timestep axis. By default, data is exported in the same format as it was provided at initialization. Callers can override default behavior by setting this argument.

psiz.data.sample_qr_sets(query_idx, n_reference, n_sample, reference_probability, replace=True, rng=None)

Sample query-reference sets for a specific query.

For problems involving more than a few stimuli, it is infeasible to generate an exhaustive list of stimlus sets. As an alternative, references can be stochastically sampled based on user-provide probabilities.

Parameters
  • query_idx – An integer indicating the query index.

  • n_reference – An integer indicating the number of references in each trial.

  • n_sample – Integer indicating the number of unique combinations to sample. If replace=False, then there is a limit on the number of unique samples; if n_sample is greater than number of unique k-combinations, an exhaustive list of all k-combinations is returned (which will be less than the requested number of samples).

  • reference_probability – An array of nonnegative values indicating the probability of selecting each stimulus as a reference given the query indicated by query_idx. It is assumed, but not checked, that all values are nonnegative. This array must include the probability of the query index so that index position semantics are preserved. The values do not need to sum to 1 since the array is normalized internally. shape=(n_stimuli,)

  • replace (optional) – Boolean indicating if the sampling is with or without replacement. The default is True, meaning that a particular k-combination can be sampled multiple times. If unique samples are desired, set replace=False.

  • rng (optional) – A NumPy random number generator that can be used to control stochasticity.

Returns

A set of query-reference samples.

shape=(n_samples, n_reference + 1)

Return type

samples

psiz.data.unravel_timestep(x)

Unravel sample and timestep axis into a single axis.

Parameters

x – A time-step based data structure. shape=(samples, sequence_length, [m, n, …])

Returns

New data structure.

shape=(samples * sequence_length, [m, n, …])