Skip to main content

🎉 We released Spotlight 1.6.0 check it out

Version: 1.6.0

spotlight

Renumics Spotlight allows you to quickly explore your datasets.

To serve an interactive view of your dataset, simply pass it to spotlight.show.

import pandas as pd
from renumics import spotlight
df = pd.DataFrame(
{
"int": range(4),
"str": "foo",
"dt": pd.Timestamp("2017-01-01T12"),
"cat": pd.Categorical(["foo", "bar"] * 2),
}
)
spotlight.show(df)

Spotlight tries to infer supported column types from your data, but you can overwrite these column types. Supply your custom mapping as dtype parameter to spotlight.show. For detailed overview of all the supported data types, see renumics.spotlight.dtypes.

viewer = spotlight.show(df, dtype={"int": "float", "str": "category"})

We try to support a wide range of data sources, such as CSV, Parquet, ORC and Hugging Face datasets.

df = pd.read_csv("https://renumics.com/data/mnist/mnist-tiny.csv")
spotlight.show(df, dtype={"image": dtypes.image_dtype})

import datasets
ds = datasets.load_dataset("mnist", split="test")
spotlight.show(ds)

As an alternative that fully supports and persist our column types you can use our custom H5 dataset.

from datetime import datetime
from renumics import spotlight
with spotlight.Dataset("docs/example.h5", "w") as dataset:
dataset.append_int_column("int", range(4))
dataset.append_string_column("str", "foo")
dataset.append_datetime_column("dt", datetime(2017, 1, 1, 12))
dataset.append_categorical_column("cat", ["foo", "bar"] * 2)
spotlight.show("docs/example.h5")

To show an updated dataset change some viewer settings (e.g. provide custom types), you can reuse the viewer instance returned by spotlight.show.

df = pd.read_csv("https://renumics.com/data/mnist/mnist-tiny.csv")
viewer = spotlight.show(df)
df["str"] = "foo"
viewer.show(df)
viewer.show(dtype={"image": dtypes.image_dtype})

Functions

show(dataset=None, folder=None, host='127.0.0.1', port='auto', layout=None, no_browser=False, allow_filebrowsing='auto', wait='auto', dtype=None, analyze=None, issues=None, embed=None)

Start a new Spotlight viewer.

Args

dataset : Dataset file or pandas.DataFrame (df) to open.

folder : Root folder for filebrowser and lookup of dataset files.

host : optional host to run Spotlight at.

port : optional port to run Spotlight at. If "auto" (default), automatically choose a random free port.

layout : optional Spotlight layout.

no_browser : do not show Spotlight in browser.

allow_filebrowsing : Whether to allow users to browse and open datasets. If "auto" (default), allow to browse if dataset_or_folder is a path.

wait : If True, block code execution until all Spotlight browser tabs are closed. If False, continue code execution after Spotlight start. If "forever", keep spotlight running forever, but block. If "auto" (default), choose the mode automatically: non-blocking (False) for jupyter notebook, ipython and other interactive sessions; blocking (True) for scripts.

dtype : Optional dict with mapping column name -> column type with column types allowed by Spotlight (for dataframes only).

analyze : Automatically analyze common dataset issues (disabled by default).

issues : Custom dataset issues displayed in the viewer.

embed : Automatically embed all or given columns with default embedders (disabled by default).

close(port='last')

Close an active Spotlight viewer.

Args

port : optional port number at which the Spotlight viewer is running. If "last" (default), close the last started Spotlight viewer.

Raises

ViewNotFoundError : if no Spotlight viewer found at the given port.

viewers()

Get all active Spotlight viewer instances.

clear_caches()

Clear all cached data.

Classes

Viewer(host='127.0.0.1', port='auto')

A Spotlight viewer. It corresponds to a single running Spotlight instance.

Viewer can be created using spotlight.show.

Attributes

host : host at which Spotlight is running

port : port at which Spotlight is running

Instance variables

df

Get served DataFrame if a DataFrame is served, None otherwise.

host

The configured host setting.

port

The port the viewer is running on.

running

True if the viewer's webserver is running, false otherwise.

url

The viewer's url.

Methods

close(self, wait=False)

Shutdown the corresponding Spotlight instance.

open_browser(self)

Open the corresponding Spotlight instance in a browser.

refresh(self)

Refresh the corresponding Spotlight instance in a browser.

show(self, dataset, folder=None, layout=None, no_browser=False, allow_filebrowsing='auto', wait='auto', dtype=None, analyze=None, issues=None, embed=None)

Show a dataset or folder in this spotlight viewer.

Args

dataset : Dataset file or pandas.DataFrame (df) to open.

folder : Root folder for filebrowser and lookup of dataset files.

layout : Optional Spotlight layout.

no_browser : Do not show Spotlight in browser.

allow_filebrowsing : Whether to allow users to browse and open datasets. If "auto" (default), allow to browse if dataset_or_folder is a path.

wait : If True, block code execution until all Spotlight browser tabs are closed. If False, continue code execution after Spotlight start. If "forever", keep spotlight running forever, but block. If "auto" (default), choose the mode automatically: non-blocking (False) for jupyter notebook, ipython and other interactive sessions; blocking (True) for scripts.

dtype : Optional dict with mapping column name -> column type with column types allowed by Spotlight (for dataframes only).

analyze : Automatically analyze common dataset issues (disabled by default).

issues : Custom dataset issues displayed in the viewer.

embed : Automatically embed all or given columns with default embedders (disabled by default).

DataIssue(*args, **kwargs)

An Issue affecting multiple rows of the dataset