Skip to main content

🎉 We released Spotlight 1.6.0 check it out

Version: 1.6.0

Loading data

With Spotlight you can interactively explore your unstructured data directly from your dataframe. When the data is loaded into Spotlight, the tabular data (e.g. labels, metadata) is loaded into memory and you can use the web frontend to perform efficient in-memory analytics. Unstructured data samples (e.g. images, video, audio, time series) are loaded lazily from disk or web storage.

Supported data formats

Spotlight can be started either through the Python API or via the Command Line Interface (CLI). Three different dataset representations are supported: Pandas dataframes, Huggingface datasets and Spotlight datasets based on the HDF5-Format.

If you load your dataset via CLI, you can specify a file to be loaded. In this example we load a CSV file:

spotlight mnist-tiny.csv

With the Python API you can additionally load in-memory datasets. This is useful when working in a notebook. Loading a Pandas dataframe is as simple as:

from renumics import spotlight

spotlight.show(df)

This table gives an overview over the supported data formats:

FormatCLIPython API
CSV, Parquet, Feather, ORC (through Pandas)
Pandas in memory
Huggingface
Spotlight HDF5

Supported data types

Spotlight supports a wide range of data types both for tabular and unstructured data types. When possible, the data types are automatically discovered.

It is also possible to manually specify data types for certain columns:

from renumics import spotlight

dtype = {"image": spotlight.Image, "embedding":spotlight.Embedding}
spotlight.show(df, dtype=dtype)

We provide a more detailed description vor both tabular data types and unstructured data types.

Load a Pandas dataset

Find more information how to load your Pandas dataframe in just a few lines of code.

Huggingface

Find more information how to load your Hugging Face dataset in just a few lines of code.

Spotlight HDF5 dataset

Find more information how to use the Spotlight HDF5 dataset format to load complex multimdal data.