dataset
This module provides Spotlight dataset.
Classes​
Dataset(filepath, mode)
​
Spotlight dataset.
Static methods
check_column_name(name)
​
Check a column name.
Instance variables
filepath
​
Dataset file name.
mode
​
Dataset file open mode.
Methods
append_array_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None)
​
Create and optionally fill a numpy array column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
Example
>>> import numpy as np
>>> from renumics.spotlight import Dataset
>>> array_data = np.random.rand(5,3)
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_array_column("arrays", 5*[array_data])
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["arrays", 2].shape)
(5, 3)
append_audio_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False, lossy=None)
​
Create and optionally fill an audio column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
lookup
: Optional data lookup/flag for automatic lookup creation.
If False
(default if external
is True
), never add data to
lookup.
If True
(default if external
is False
), add all given
files to the lookup, do nothing for explicitly given data.
If lookup is given, store it explicit, further behaviour is as
for True
. If lookup is not a dict, keys are created automatically.
external
: Whether column should only contain paths/URLs to data and
load it on demand.
lossy
: Whether to store data lossy or lossless (default if
external
is False
). Not recomended to use with
external=True
since it requires on demand transcoding which
slows down the execution.
Example
Find an example usage in renumics.spotlight.media.Audio
.
append_bool_column(self, name, values=None, order=None, hidden=False, optional=False, default=False, description=None, tags=None, editable=True)
​
Create and optionally fill a boolean column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
Example
>>> from renumics.spotlight import Dataset
>>> value = False
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bool_values", 5*[value])
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["bool_values", 2])
False
append_bounding_box_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)
​
Create and optionally fill axis-aligned bounding box column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
append_categorical_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True, categories=None)
​
Create and optionally fill a categorical column.
Args
name
: Column name.
categories
: The allowed categories for this column ("" is not allowed)
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than empty
string is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
Example
Find an example usage in renumics.spotlight.dtypes.Category
.
append_column(self, name, dtype, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, **attrs)
​
Create and optionally fill a dataset column of the given type.
Args
name
: Column name.
dtype
: Column type.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
attrs
: Optional arguments for the respective append column method.
Example
>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_column("int", int, range(5))
... dataset.append_column("float", float, 1.0)
... dataset.append_column("bool", bool, True)
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(len(dataset))
... print(sorted(dataset.keys()))
5
['bool', 'float', 'int']
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["int"])
... print(dataset["bool"])
... print(dataset["float"])
[0 1 2 3 4]
[ True True True True True]
[1. 1. 1. 1. 1.]
append_dataset(self, dataset)
​
Append a dataset to the current dataset row-wise.
append_datetime_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None)
​
Create and optionally fill a datetime column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
Example
>>> import numpy as np
>>> import datetime
>>> from renumics.spotlight import Dataset
>>> date = datetime.datetime.now()
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_datetime_column("dates", 5*[date])
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(dataset["dates", 2] < datetime.datetime.now())
True
append_embedding_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, length=None, dtype='float32')
​
Create and optionally fill a mesh column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
dtype
: A valid float numpy dtype. Default is "float32".
Example
Find an example usage in renumics.spotlight.dtypes.Embedding
.
append_float_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)
​
Create and optionally fill a float column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than NaN is
specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
Example
Find a similar example usage in
renumics.spotlight.dataset.Dataset.append_bool_column
.
append_image_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False)
​
Create and optionally fill an image column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
lookup
: Optional data lookup/flag for automatic lookup creation.
If False
(default if external
is True
), never add data to
lookup.
If True
(default if external
is False
), add all given
files to the lookup, do nothing for explicitly given data.
If lookup is given, store it explicit, further behaviour is as
for True
. If lookup is not a dict, keys are created automatically.
external
: Whether column should only contain paths/URLs to data and
load it on demand.
Example
Find an example usage in renumics.spotlight.dtypes.Image
.
append_int_column(self, name, values=None, order=None, hidden=False, optional=False, default=-1, description=None, tags=None, editable=True)
​
Create and optionally fill an integer column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
Example
Find a similar example usage in
renumics.spotlight.dataset.Dataset.append_bool_column
.
append_mesh_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False)
​
Create and optionally fill a mesh column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
lookup
: Optional data lookup/flag for automatic lookup creation.
If False
(default if external
is True
), never add data to
lookup.
If True
(default if external
is False
), add all given
files to the lookup, do nothing for explicitly given data.
If lookup is given, store it explicit, further behaviour is as
for True
. If lookup is not a dict, keys are created automatically.
external
: Whether column should only contain paths/URLs to data and
load it on demand.
Example
Find an example usage in renumics.spotlight.dtypes.Mesh
.
append_row(self, **values)
​
Append a row to the dataset.
Args
values
: A mapping column name -> value. Keys of values
should
match dataset column names exactly except for optional columns.
Example
>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bool_values")
... dataset.append_float_column("float_values")
>>> data = {"bool_values":True, "float_values":0.2}
>>> with Dataset("docs/example.h5", "a") as dataset:
... dataset.append_row(**data)
... dataset.append_row(**data)
... print(dataset["float_values", 1])
0.2
append_sequence_1d_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, x_label=None, y_label=None)
​
Create and optionally fill a 1d-sequence column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
x_label
: Optional x-axis label.
y_label
: Optional y-axis label. If None
, column name is taken.
Example
Find an example usage in renumics.spotlight.dtypes.Sequence1D
.
append_string_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)
​
Create and optionally fill a float column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than empty
string is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
Example
Find a similar example usage in
renumics.spotlight.dataset.Dataset.append_bool_column
.
append_video_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, lookup=None, external=False)
​
Create and optionally fill an video column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
lookup
: Optional data lookup/flag for automatic lookup creation.
If False
(default if external
is True
), never add data to
lookup.
If True
(default if external
is False
), add all given
files to the lookup, do nothing for explicitly given data.
If lookup is given, store it explicit, further behaviour is as
for True
. If lookup is not a dict, keys are created automatically.
external
: Whether column should only contain paths/URLs to data and
load it on demand.
append_window_column(self, name, values=None, order=None, hidden=False, optional=False, default=None, description=None, tags=None, editable=True)
​
Create and optionally fill window column.
Args
name
: Column name.
values
: Optional column values. If a single value, the whole column
filled with this value.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
editable
: Whether column is editable in Spotlight.
Example
Find an example usage in renumics.spotlight.dtypes.Window
.
close(self)
​
Close file.
from_csv(self, filepath, dtypes=None, columns=None, workdir=None)
​
Args
filepath
: Path of csv file to read.
dtype
: Optional dict with mapping column name -> column type
with
column types allowed by Spotlight.
columns
: Optional columns to read from csv. If not set, read all
columns.
workdir
: Optional folder where audio/images/meshes are stored. If
None
, csv folder is used.
from_pandas(self, df, index=False, dtypes=None, workdir=None)
​
Import a pandas dataframe to the dataset.
Only scalar types supported by the Spotlight dataset are imported, the other are printed in a warning message.
Args
df
: pandas.DataFrame
to import.
index
: Whether to import index of the dataframe as regular dataset
column.
dtypes
: Optional dict with mapping column name -> column type
with
column types allowed by Spotlight.
workdir
: Optional folder where audio/images/meshes are stored. If
None
, current folder is used.
Example
>>> from datetime import datetime
>>> import pandas as pd
>>> from renumics.spotlight import Dataset
>>> df = pd.DataFrame(
... {
... "bools": [True, False, False],
... "ints": [-1, 0, 1],
... "floats": [-1.0, 0.0, 1.0],
... "strings": ["a", "b", "c"],
... "datetimes": datetime.now().astimezone(),
... }
... )
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.from_pandas(df, index=False)
>>> with Dataset("docs/example.h5", "r") as dataset:
... print(len(dataset))
... print(sorted(dataset.keys()))
3
['bools', 'datetimes', 'floats', 'ints', 'strings']
get_column_attributes(self, name)
​
Get attributes of a column. Available but unset attributes contain None.
Args
name
: Column name.
Example
>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_int_column("int", range(5))
... dataset.append_int_column(
... "int1",
... hidden=True,
... default=10,
... description="integer column",
... tags=["important"],
... editable=False,
... )
>>> with Dataset("docs/example.h5", "r") as dataset:
... attributes = dataset.get_column_attributes("int")
... for key in sorted(attributes.keys()):
... print(key, attributes[key])
default -1
description None
editable True
hidden False
optional True
order None
tags None
>>> with Dataset("docs/example.h5", "r") as dataset:
... attributes = dataset.get_column_attributes("int1")
... for key in sorted(attributes.keys()):
... print(key, attributes[key])
default 10
description integer column
editable False
hidden True
optional True
order None
tags ['important']
get_dtype(self, column_name)
​
Get type of dataset column.
Args
column_name
: Column name.
Example
>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bool")
... dataset.append_datetime_column("datetime")
... dataset.append_array_column("array")
... dataset.append_mesh_column("mesh")
>>> with Dataset("docs/example.h5", "r") as dataset:
... for column_name in sorted(dataset.keys()):
... print(column_name, dataset.get_dtype(column_name))
array array
bool bool
datetime datetime
mesh Mesh
insert_row(self, index, values)
​
Insert a row into the dataset at the given index.
Example
>>> from renumics.spotlight import Dataset
>>> with Dataset("example.h5", "w") as dataset:
... dataset.append_float_column("floats", [-1.0, 0.0, 1.0])
... dataset.append_int_column("ints", [-1, 0, 2])
... print(len(dataset))
... print(dataset["floats"])
... print(dataset["ints"])
3
[-1. 0. 1.]
[-1 0 2]
>>> with Dataset("example.h5", "a") as dataset:
... dataset.insert_row(2, {"floats": float("nan"), "ints": 1000})
... dataset.insert_row(-3, {"floats": 3.14, "ints": -1000})
... print(len(dataset))
... print(dataset["floats"])
... print(dataset["ints"])
5
[-1. 3.14 0. nan 1. ]
[ -1 -1000 0 1000 2]
isnull(self, column_name)
​
Get missing values mask for the given column.
None
, NaN
and category "" values are mapped to True
. So null-mask
for columns of type bool
, int
and string
always has only False
values.
A Window
is mapped on True
only if both start and end are NaN
.
iterrows(self, column_names=None)
​
Iterate through dataset rows.
keys(self)
​
Get dataset column names.
notnull(self, column_name)
​
Get non-missing values mask for the given column.
None
, NaN
and category "" values are mapped to False
. So non-null-mask
for columns of type bool
, int
and string
always has only True
values.
A Window
is mapped on True
if at least one of its values is not NaN
.
open(self, mode=None)
​
Open previously closed file or reopen file with another mode.
Args
mode
: Optional open mode. If not given, use self.mode
.
pop(self, item)
​
Delete a dataset column or row and return it.
prune(self)
​
Rebuild the whole dataset with the same content.
This method can be useful after column deletions, in order to decrease the dataset file size.
rebuild(self)
​
Update old-style columns in the dataset.
Be aware, that it can take some time and memory. It is useful to do
prune
after rebuild
.
rename_column(self, old_name, new_name)
​
Rename a dataset column.
set_column_attributes(self, name, order=None, hidden=None, optional=None, default=None, description=None, tags=None, **attrs)
​
Set attributes of a column.
Args
name
: Column name.
order
: Optional Spotlight priority order value. None
means the
lowest priority.
hidden
: Whether column is hidden in Spotlight.
optional
: Whether column is optional. If default
other than None
is specified, optional
is automatically set to True
.
default
: Value to use by default if column is optional and no value
or None
is given.
description
: Optional column description.
tags
: Optional tags for the column.
attrs
: Optional more DType specific attributes .
to_pandas(self)
​
Export the dataset to pandas dataframe.
Only scalar types of the Spotlight dataset are exported, the others are printed in a warning message.
Returns
pandas.DataFrame
filled with the data of the Spotlight dataset.
Example
>>> import pandas as pd
>>> from renumics.spotlight import Dataset
>>> with Dataset("docs/example.h5", "w") as dataset:
... dataset.append_bool_column("bools", [True, False, False])
... dataset.append_int_column("ints", [-1, 0, 1])
... dataset.append_float_column("floats", [-1.0, 0.0, 1.0])
... dataset.append_string_column("strings", ["a", "b", "c"])
... dataset.append_datetime_column("datetimes", optional=True)
>>> with Dataset("docs/example.h5", "r") as dataset:
... df = dataset.to_pandas()
>>> print(len(df))
3
>>> print(df.columns.sort_values())
Index(['bools', 'datetimes', 'floats', 'ints', 'strings'], dtype='object')