Zum Hauptinhalt springen

🎉 We released Spotlight 1.6.0 check it out →

Version: 1.6.0

dataset

This module provides Spotlight dataset.

Classes​

Dataset(filepath, mode)​

Spotlight dataset.

Static methods

check_column_name(name)​

Check a column name.

Instance variables

filepath​

Dataset file name.

mode​

Dataset file open mode.

Methods

append_array_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None)​

Create and optionally fill a numpy array column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

Example

>>> import numpy as np
>>> from renumics.spotlight import Dataset
>>> array_data = np.random.rand(5,3)
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_array_column("arrays", 5*[array_data])
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["arrays", 2].shape)
(5, 3)

append_audio_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False, lossy=None)​

Create and optionally fill an audio column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

lookup : Optional data lookup/flag for automatic lookup creation. If False (default if external is True), never add data to lookup. If True (default if external is False), add all given files to the lookup, do nothing for explicitly given data. If lookup is given, store it explicit, further behaviour is as for True. If lookup is not a dict, keys are created automatically.

external : Whether column should only contain paths/URLs to data and load it on demand.

lossy : Whether to store data lossy or lossless (default if external is False). Not recomended to use with external=True since it requires on demand transcoding which slows down the execution.

Example

Find an example usage in renumics.spotlight.media.Audio.

append_bool_column(self, name, values=None, order=None, hidden=False, optional=False, default=False, description=None, tags=None, editable=True)​

Create and optionally fill a boolean column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

Example

>>> from renumics.spotlight import Dataset
>>> value = False
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bool_values", 5*[value])
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["bool_values", 2])
False

append_bounding_box_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)​

Create and optionally fill axis-aligned bounding box column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

append_categorical_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True, categories=None)​

Create and optionally fill a categorical column.

Args

name : Column name.

categories : The allowed categories for this column ("" is not allowed)

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than empty string is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

Example

Find an example usage in renumics.spotlight.dtypes.Category.

append_column(self, name, dtype, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, **attrs)​

Create and optionally fill a dataset column of the given type.

Args

name : Column name.

dtype : Column type.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

attrs : Optional arguments for the respective append column method.

Example

>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_column("int", int, range(5))
... dataset.append_column("float", float, 1.0)
... dataset.append_column("bool", bool, True)
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(len(dataset))
... print(sorted(dataset.keys()))
5
['bool', 'float', 'int']
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["int"])
... print(dataset["bool"])
... print(dataset["float"])
[0 1 2 3 4]
[ True True True True True]
[1. 1. 1. 1. 1.]

append_dataset(self, dataset)​

Append a dataset to the current dataset row-wise.

append_datetime_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None)​

Create and optionally fill a datetime column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

Example

>>> import numpy as np
>>> import datetime
>>> from renumics.spotlight import Dataset
>>> date = datetime.datetime.now()
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_datetime_column("dates", 5*[date])
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["dates", 2] < datetime.datetime.now())
True

append_embedding_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, length=None, dtype='float32')​

Create and optionally fill a mesh column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

dtype : A valid float numpy dtype. Default is "float32".

Example

Find an example usage in renumics.spotlight.dtypes.Embedding.

append_float_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)​

Create and optionally fill a float column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than NaN is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

Example

Find a similar example usage in renumics.spotlight.dataset.Dataset.append_bool_column.

append_image_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False)​

Create and optionally fill an image column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

lookup : Optional data lookup/flag for automatic lookup creation. If False (default if external is True), never add data to lookup. If True (default if external is False), add all given files to the lookup, do nothing for explicitly given data. If lookup is given, store it explicit, further behaviour is as for True. If lookup is not a dict, keys are created automatically.

external : Whether column should only contain paths/URLs to data and load it on demand.

Example

Find an example usage in renumics.spotlight.dtypes.Image.

append_int_column(self, name, values=None, order=None, hidden=False, optional=False, default=-1, description=None, tags=None, editable=True)​

Create and optionally fill an integer column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

Example

Find a similar example usage in renumics.spotlight.dataset.Dataset.append_bool_column.

append_mesh_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False)​

Create and optionally fill a mesh column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

lookup : Optional data lookup/flag for automatic lookup creation. If False (default if external is True), never add data to lookup. If True (default if external is False), add all given files to the lookup, do nothing for explicitly given data. If lookup is given, store it explicit, further behaviour is as for True. If lookup is not a dict, keys are created automatically.

external : Whether column should only contain paths/URLs to data and load it on demand.

Example

Find an example usage in renumics.spotlight.dtypes.Mesh.

append_row(self, **values)​

Append a row to the dataset.

Args

values : A mapping column name -> value. Keys of values should match dataset column names exactly except for optional columns.

Example

>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bool_values")
... dataset.append_float_column("float_values")
>>> data = {"bool_values":True, "float_values":0.2}
>>> with Dataset("docs/example.h5", "a") as dataset:
... dataset.append_row(**data)
... dataset.append_row(**data)
... print(dataset["float_values", 1])
0.2

append_sequence_1d_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, x_label=None, y_label=None)​

Create and optionally fill a 1d-sequence column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

x_label : Optional x-axis label.

y_label : Optional y-axis label. If None, column name is taken.

Example

Find an example usage in renumics.spotlight.dtypes.Sequence1D.

append_string_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)​

Create and optionally fill a float column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than empty string is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

Example

Find a similar example usage in renumics.spotlight.dataset.Dataset.append_bool_column.

append_video_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False)​

Create and optionally fill an video column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

lookup : Optional data lookup/flag for automatic lookup creation. If False (default if external is True), never add data to lookup. If True (default if external is False), add all given files to the lookup, do nothing for explicitly given data. If lookup is given, store it explicit, further behaviour is as for True. If lookup is not a dict, keys are created automatically.

external : Whether column should only contain paths/URLs to data and load it on demand.

append_window_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)​

Create and optionally fill window column.

Args

name : Column name.

values : Optional column values. If a single value, the whole column filled with this value.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

editable : Whether column is editable in Spotlight.

Example

Find an example usage in renumics.spotlight.dtypes.Window.

close(self)​

Close file.

from_csv(self, filepath, dtypes=None, columns=None, workdir=None)​

Args

filepath : Path of csv file to read.

dtype : Optional dict with mapping column name -> column type with column types allowed by Spotlight.

columns : Optional columns to read from csv. If not set, read all columns.

workdir : Optional folder where audio/images/meshes are stored. If None, csv folder is used.

from_pandas(self, df, index=False, dtypes=None, workdir=None)​

Import a pandas dataframe to the dataset.

Only scalar types supported by the Spotlight dataset are imported, the other are printed in a warning message.

Args

df : pandas.DataFrame to import.

index : Whether to import index of the dataframe as regular dataset column.

dtypes : Optional dict with mapping column name -> column type with column types allowed by Spotlight.

workdir : Optional folder where audio/images/meshes are stored. If None, current folder is used.

Example

>>> from datetime import datetime
>>> import pandas as pd
>>> from renumics.spotlight import Dataset
>>> df = pd.DataFrame(
... {
... "bools": [True, False, False],
... "ints": [-1, 0, 1],
... "floats": [-1.0, 0.0, 1.0],
... "strings": ["a", "b", "c"],
... "datetimes": datetime.now().astimezone(),
... }
... )
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.from_pandas(df, index=False)
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(len(dataset))
... print(sorted(dataset.keys()))
3
['bools', 'datetimes', 'floats', 'ints', 'strings']

get_column_attributes(self, name)​

Get attributes of a column. Available but unset attributes contain None.

Args

name : Column name.

Example

>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_int_column("int", range(5))
... dataset.append_int_column(
... "int1",
... hidden=True,
... default=10,
... description="integer column",
... tags=["important"],
... editable=False,
... )
>>> with Dataset("docs/example.h5", "r") as dataset:
... attributes = dataset.get_column_attributes("int")
... for key in sorted(attributes.keys()):
... print(key, attributes[key])
default -1
description None
editable True
hidden False
optional True
order None
tags None
>>> with Dataset("docs/example.h5", "r") as dataset:
... attributes = dataset.get_column_attributes("int1")
... for key in sorted(attributes.keys()):
... print(key, attributes[key])
default 10
description integer column
editable False
hidden True
optional True
order None
tags ['important']

get_dtype(self, column_name)​

Get type of dataset column.

Args

column_name : Column name.

Example

>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bool")
... dataset.append_datetime_column("datetime")
... dataset.append_array_column("array")
... dataset.append_mesh_column("mesh")
>>> with Dataset("docs/example.h5", "r") as dataset:
... for column_name in sorted(dataset.keys()):
... print(column_name, dataset.get_dtype(column_name))
array array
bool bool
datetime datetime
mesh Mesh

insert_row(self, index, values)​

Insert a row into the dataset at the given index.

Example

>>> from renumics.spotlight import Dataset
>>> with Dataset("example.h5", "w") as dataset:
... dataset.append_float_column("floats", [-1.0, 0.0, 1.0])
... dataset.append_int_column("ints", [-1, 0, 2])
... print(len(dataset))
... print(dataset["floats"])
... print(dataset["ints"])
3
[-1. 0. 1.]
[-1 0 2]
>>> with Dataset("example.h5", "a") as dataset:
... dataset.insert_row(2, {"floats": float("nan"), "ints": 1000})
... dataset.insert_row(-3, {"floats": 3.14, "ints": -1000})
... print(len(dataset))
... print(dataset["floats"])
... print(dataset["ints"])
5
[-1. 3.14 0. nan 1. ]
[ -1 -1000 0 1000 2]

isnull(self, column_name)​

Get missing values mask for the given column.

None, NaN and category "" values are mapped to True. So null-mask for columns of type bool, int and string always has only False values. A Window is mapped on True only if both start and end are NaN.

iterrows(self, column_names=None)​

Iterate through dataset rows.

keys(self)​

Get dataset column names.

notnull(self, column_name)​

Get non-missing values mask for the given column.

None, NaN and category "" values are mapped to False. So non-null-mask for columns of type bool, int and string always has only True values. A Window is mapped on True if at least one of its values is not NaN.

open(self, mode=None)​

Open previously closed file or reopen file with another mode.

Args

mode : Optional open mode. If not given, use self.mode.

pop(self, item)​

Delete a dataset column or row and return it.

prune(self)​

Rebuild the whole dataset with the same content.

This method can be useful after column deletions, in order to decrease the dataset file size.

rebuild(self)​

Update old-style columns in the dataset. Be aware, that it can take some time and memory. It is useful to do prune after rebuild.

rename_column(self, old_name, new_name)​

Rename a dataset column.

set_column_attributes(self, name, order=None, hidden=None, optional=None, default=None, description=None, tags=None, **attrs)​

Set attributes of a column.

Args

name : Column name.

order : Optional Spotlight priority order value. None means the lowest priority.

hidden : Whether column is hidden in Spotlight.

optional : Whether column is optional. If default other than None is specified, optional is automatically set to True.

default : Value to use by default if column is optional and no value or None is given.

description : Optional column description.

tags : Optional tags for the column.

attrs : Optional more DType specific attributes .

to_pandas(self)​

Export the dataset to pandas dataframe.

Only scalar types of the Spotlight dataset are exported, the others are printed in a warning message.

Returns

pandas.DataFrame filled with the data of the Spotlight dataset.

Example

>>> import pandas as pd
>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bools", [True, False, False])
... dataset.append_int_column("ints", [-1, 0, 1])
... dataset.append_float_column("floats", [-1.0, 0.0, 1.0])
... dataset.append_string_column("strings", ["a", "b", "c"])
... dataset.append_datetime_column("datetimes", optional=True)
>>> with Dataset("docs/example.h5", "r") as dataset:
... df = dataset.to_pandas()
>>> print(len(df))
3
>>> print(df.columns.sort_values())
Index(['bools', 'datetimes', 'floats', 'ints', 'strings'], dtype='object')