Skip to main content

🎉 We released a Spotlight OSS Version! ⭐ Star it on Github

Version: 1.0.0

Spotlight data types

Spotlight is a powerful tool that supports a diverse range of data types, from simple scalar values to complex objects such as images, videos, audio, and 3D meshes. With Spotlight's versatile support for these data types, you can easily unlock the full potential of your data.

Spotlight's advanced features also include embeddings generated by machine learning models or other mappings, allowing for powerful arrangement and exploration of datasets. Most data types can be automatically inferred from input data, but you can also specify them explicitly for greater control. For complex data types like images, you can input the path to the data and set the type to spotlight.Image, simplifying data exploration and analysis.

So whether you're working with scalar values, images, videos, audio, or any other supported data type, Spotlight is here to help you explore and analyze your data with ease.

Scalar data types

Scalar data types in Spotlight represent tabular data and include built-in types such as bool, int, float, datetime.datetime, as well as our custom-defined spotlight.Category. These types are displayed in Spotlight's table view, and are suitable for use in other aggregate views as input data, for coloring, or for scaling.

Scalar data types have their own representations in the inspector view, but they are not as feature-rich and flexible as for complex data types.


Most of pandas' dtypes have corresponding types in Spotlight, which will be automatically understood and interpreted. However, any unknown or mixed column types, which are typically represented by the object type in pandas, will be converted to strings. If such a conversion is not possible, they will not be imported to Spotlight.

df = pd.DataFrame(
"boolean": [True, False, False, True],
"integer": range(4),
"float": 1.0,
"string": "foo",
"categorical": pd.Categorical(["test", "train", "test", "train"]),
"mixed": [False, 1, float("nan"), "bar"],

Pandas scalars

Most of our column types also support missing values. However, unlike pandas, we do not support nullable bool and nullable int data types, so these columns will be imported to Spotlight as string columns.

df["boolean"] = df["boolean"].astype("boolean")
df["integer"] = df["integer"].astype("Int64")
df.iloc[1] = None

Pandas nullable scalars

If you wish to specify custom column types, you can still do so using the dtype argument of the function. In this case, if any column cannot be imported as specified in the dtype, an exception will be raised.
"integer": spotlight.Category,
"float": str,
"mixed": spotlight.Category,

Pandas custom scalars


Besides using a pandas DataFrame, you can also use Spotlight's HDF5 file as input data.

To generate an HDF5 file, Spotlight provides a convenient Dataset wrapper that handles the creation of columns and the writing of the file. Once created, these HDF5 files can be easily loaded directly from the file system within the Spotlight file browser, simply by locating them in SPOTLIGHT_TABLE_FILE or its subfolders.

with spotlight.Dataset("example.h5", "w") as dataset:
dataset.append_bool_column("boolean", [True, False, False, True])
dataset.append_int_column("integer", range(4))
dataset.append_float_column("float", 1.0)
dataset.append_string_column("string", "foo")
"categorical", ["test", "train", "test", "train"]

H5 scalars

With the help of the Dataset wrapper, you can also add columns or rows to an already created Dataset by opening the Dataset in append mode.

with spotlight.Dataset("example.h5", "w") as dataset:
dataset.append_bool_column("boolean", [True, True, True, False], default=False)
dataset.append_int_column("integer", range(4), default=-1)
dataset.append_float_column("float", 1.0, optional=True)
dataset.append_string_column("string", "foo", optional=True)
"categorical", ["test", "train", "test", "train"], optional=True
dataset.append_datetime_column("datetime",, optional=True)

with spotlight.Dataset("example.h5", "a") as dataset:
dataset[1] = {key: None for key in dataset.keys()}"example.h5")

H5 scalars

Complex data types

Complex Spotlight data types represent non-tabular objects like arrays, images, meshes etc.

Unlike scalar data types, they neither can be fully shown in the data table, nor used in the most of other aggregate views (except for embeddings in the similarity map). Instead, complex data types mostly have rich appearance in the inspector view.


Embeddings can be seen as 1D arrays of the same length along column. They will be primarily used in the similarity map.


In order to load Meshes we use the trimesh library. In general you can use any of trimesh's supportet mesh formats and spotlight will display them in the inspector.


Columns with complex data will never be interpreted automatically and should be explicitely specified in the dtype argument of the function.