spotlight
Renumics Spotlight allows you to quickly explore your datasets.
To serve an interactive view of your dataset, simply pass it to spotlight.show
.
import pandas as pd
from renumics import spotlight
df = pd.DataFrame(
{
"int": range(4),
"str": "foo",
"dt": pd.Timestamp("2017-01-01T12"),
"cat": pd.Categorical(["foo", "bar"] * 2),
}
)
spotlight.show(df)
Spotlight tries to infer supported column types from your data, but you can
overwrite these column types. Supply your custom mapping as dtype
parameter to
spotlight.show
. For detailed overview of all the supported data types, see
renumics.spotlight.dtypes
.
viewer = spotlight.show(df, dtype={"int": "float", "str": "category"})
We try to support a wide range of data sources, such as CSV, Parquet, ORC and Hugging Face datasets.
df = pd.read_csv("https://renumics.com/data/mnist/mnist-tiny.csv")
spotlight.show(df, dtype={"image": dtypes.image_dtype})
import datasets
ds = datasets.load_dataset("mnist", split="test")
spotlight.show(ds)
As an alternative that fully supports and persist our column types you can use our custom H5 dataset.
from datetime import datetime
from renumics import spotlight
with spotlight.Dataset("docs/example.h5", "w") as dataset:
dataset.append_int_column("int", range(4))
dataset.append_string_column("str", "foo")
dataset.append_datetime_column("dt", datetime(2017, 1, 1, 12))
dataset.append_categorical_column("cat", ["foo", "bar"] * 2)
spotlight.show("docs/example.h5")
To show an updated dataset change some viewer settings (e.g. provide custom
types), you can reuse the viewer instance returned by spotlight.show
.
df = pd.read_csv("https://renumics.com/data/mnist/mnist-tiny.csv")
viewer = spotlight.show(df)
df["str"] = "foo"
viewer.show(df)
viewer.show(dtype={"image": dtypes.image_dtype})
Functions
show(dataset=None, folder=None, host='127.0.0.1', port='auto', layout=None, no_browser=False, allow_filebrowsing='auto', wait='auto', dtype=None, analyze=None, issues=None, embed=None)
Start a new Spotlight viewer.
Args
dataset
: Dataset file or pandas.DataFrame (df) to open.
folder
: Root folder for filebrowser and lookup of dataset files.
host
: optional host to run Spotlight at.
port
: optional port to run Spotlight at.
If "auto" (default), automatically choose a random free port.
layout
: Optional Spotlight layout.
no_browser
: do not show Spotlight in browser.
allow_filebrowsing
: Whether to allow users to browse and open datasets.
If "auto" (default), allow to browse if dataset_or_folder
is a path.
wait
: If True
, block code execution until all Spotlight browser tabs are closed.
If False
, continue code execution after Spotlight start.
If "forever", keep spotlight running forever, but block.
If "auto" (default), choose the mode automatically: non-blocking (False
) for
jupyter notebook
, ipython
and other interactive sessions;
blocking (True
) for scripts.
dtype
: Optional dict with mapping column name -> column type
with
column types allowed by Spotlight (for dataframes only).
analyze
: Automatically analyze common dataset issues (disabled by default).
issues
: Custom dataset issues displayed in the viewer.
embed
: Automatically embed all or given columns with default
embedders (disabled by default).
close(port='last')
Close an active Spotlight viewer.
Args
port
: optional port number at which the Spotlight viewer is running.
If "last" (default), close the last started Spotlight viewer.
Raises
ViewNotFoundError
: if no Spotlight viewer found at the given port
.
viewers()
Get all active Spotlight viewer instances.
clear_caches()
Clear all cached data.
Classes
Viewer(host='127.0.0.1', port='auto')
A Spotlight viewer. It corresponds to a single running Spotlight instance.
Viewer can be created using spotlight.show
.
Attributes
host
: host at which Spotlight is running
port
: port at which Spotlight is running
Instance variables
df
Get served DataFrame
if a DataFrame
is served, None
otherwise.
host
The configured host setting.
port
The port the viewer is running on.
running
True if the viewer's webserver is running, false otherwise.
url
The viewer's url.
Methods
close(self, wait=False)
Shutdown the corresponding Spotlight instance.
open_browser(self)
Open the corresponding Spotlight instance in a browser.
refresh(self)
Refresh the corresponding Spotlight instance in a browser.
show(self, dataset, folder=None, layout=None, no_browser=False, allow_filebrowsing='auto', wait='auto', dtype=None, analyze=None, issues=None, embed=None)
Show a dataset or folder in this spotlight viewer.
Args
dataset
: Dataset file or pandas.DataFrame (df) to open.
folder
: Root folder for filebrowser and lookup of dataset files.
layout
: Optional Spotlight layout.
no_browser
: Do not show Spotlight in browser.
allow_filebrowsing
: Whether to allow users to browse and open datasets.
If "auto" (default), allow to browse if dataset_or_folder
is a path.
wait
: If True
, block code execution until all Spotlight browser tabs are closed.
If False
, continue code execution after Spotlight start.
If "forever", keep spotlight running forever, but block.
If "auto" (default), choose the mode automatically: non-blocking (False
) for
jupyter notebook
, ipython
and other interactive sessions;
blocking (True
) for scripts.
dtype
: Optional dict with mapping column name -> column type
with
column types allowed by Spotlight (for dataframes only).
analyze
: Automatically analyze common dataset issues (disabled by default).
issues
: Custom dataset issues displayed in the viewer.
embed
: Automatically embed all or given columns with default
embedders (disabled by default).
DataIssue(*args, **kwargs)
An Issue affecting multiple rows of the dataset