Spotlight data types
Spotlight is a powerful tool that supports a diverse range of data types, from simple scalar values to complex objects such as images, videos, audio, and 3D meshes. With Spotlight's versatile support for these data types, you can easily unlock the full potential of your data.
Spotlight's advanced features also include embeddings generated by machine learning models or other mappings, allowing for powerful arrangement and exploration of datasets. Most data types can be automatically inferred from input data, but you can also specify them explicitly for greater control. For complex data types like images, you can input the path to the data and set the type to spotlight.Image
, simplifying data exploration and analysis.
So whether you're working with scalar values, images, videos, audio, or any other supported data type, Spotlight is here to help you explore and analyze your data with ease.
Scalar data types
Scalar data types in Spotlight represent tabular data and include built-in types such as bool, int, float, datetime.datetime, as well as our custom-defined spotlight.Category. These types are displayed in Spotlight's table view, and are suitable for use in other aggregate views as input data, for coloring, or for scaling.
Scalar data types have their own representations in the inspector view, but they are not as feature-rich and flexible as for complex data types.
Pandas
Most of pandas' dtypes
have corresponding types in Spotlight,
which will be automatically understood and interpreted.
However, any unknown or mixed column types, which are typically represented by the object type in pandas,
will be converted to strings.
If such a conversion is not possible, they will not be imported to Spotlight.
df = pd.DataFrame(
{
"boolean": [True, False, False, True],
"integer": range(4),
"float": 1.0,
"string": "foo",
"categorical": pd.Categorical(["test", "train", "test", "train"]),
"datetime": pd.Timestamp.now(),
"mixed": [False, 1, float("nan"), "bar"],
}
)
spotlight.show(df)
Spotlight
Most of our column types also support missing values. However, unlike pandas, we do not support nullable bool and nullable int data types, so these columns will be imported to Spotlight as string columns.
df["boolean"] = df["boolean"].astype("boolean")
df["integer"] = df["integer"].astype("Int64")
df.iloc[1] = None
spotlight.show(df)
Spotlight
If you wish to specify custom column types, you can still do so using the dtype argument of the spotlight.show function. In this case, if any column cannot be imported as specified in the dtype, an exception will be raised.
spotlight.show(
df,
dtype={
"integer": spotlight.Category,
"float": str,
"mixed": spotlight.Category,
},
)
Spotlight
H5
Besides using a pandas DataFrame, you can also use Spotlight's HDF5 file as input data.
To generate an HDF5 file, Spotlight provides a convenient Dataset wrapper that handles the creation of columns
and the writing of the file. Once created, these HDF5 files can be easily loaded directly from the file system within the Spotlight file browser,
simply by locating them in SPOTLIGHT_TABLE_FILE
or its subfolders.
with spotlight.Dataset("example.h5", "w") as dataset:
dataset.append_bool_column("boolean", [True, False, False, True])
dataset.append_int_column("integer", range(4))
dataset.append_float_column("float", 1.0)
dataset.append_string_column("string", "foo")
dataset.append_categorical_column(
"categorical", ["test", "train", "test", "train"]
)
dataset.append_datetime_column("datetime", datetime.now())
spotlight.show("example.h5")
Spotlight
With the help of the Dataset wrapper, you can also add columns or rows to an already created Dataset by opening the Dataset in append mode.
with spotlight.Dataset("example.h5", "w") as dataset:
dataset.append_bool_column("boolean", [True, True, True, False], default=False)
dataset.append_int_column("integer", range(4), default=-1)
dataset.append_float_column("float", 1.0, optional=True)
dataset.append_string_column("string", "foo", optional=True)
dataset.append_categorical_column(
"categorical", ["test", "train", "test", "train"], optional=True
)
dataset.append_datetime_column("datetime", datetime.now(), optional=True)
with spotlight.Dataset("example.h5", "a") as dataset:
dataset[1] = {key: None for key in dataset.keys()}
spotlight.show("example.h5")
Spotlight
Complex data types
Complex Spotlight data types represent non-tabular objects like arrays, images, meshes etc.
Unlike scalar data types, they neither can be fully shown in the data table, nor used in the most of other aggregate views (except for embeddings in the similarity map). Instead, complex data types mostly have rich appearance in the inspector view.
Embeddings
Embeddings can be seen as 1D arrays of the same length along column. They will be primarily used in the similarity map.
Meshes
In order to load Meshes we use the trimesh library. In general you can use any of trimesh's supportet mesh formats and spotlight will display them in the inspector.
Pandas
Columns with complex data will never be interpreted automatically and should be explicitely
specified in the dtype
argument of the spotlight.show
function.