spotlight
Renumics Spotlight allows you to quickly explore your datasets.
To serve an interactive view of your dataset, simply pass it to spotlight.show
.
import pandas as pd
from renumics import spotlight
df = pd.DataFrame(
{
"int": range(4),
"str": "foo",
"dt": pd.Timestamp("2017-01-01T12"),
"cat": pd.Categorical(["foo", "bar"] * 2),
}
)
spotlight.show(df)
Spotlight tries to infer supported column types from your data, but you can
overwrite these column types. Supply your custom mapping as dtype
parameter to
spotlight.show
. For detailed overview of all the supported data types, see
renumics.spotlight.dtypes
.
viewer = spotlight.show(df, dtype={"int": "float", "str": "category"})
We try to support a wide range of data sources, such as CSV, Parquet, ORC and Hugging Face datasets.
df = pd.read_csv("https://renumics.com/data/mnist/mnist-tiny.csv")
spotlight.show(df, dtype={"image": dtypes.image_dtype})
import datasets
ds = datasets.load_dataset("mnist", split="test")
spotlight.show(ds)
As an alternative that fully supports and persist our column types you can use our custom H5 dataset.
from datetime import datetime
from renumics import spotlight
with spotlight.Dataset("docs/example.h5", "w") as dataset:
dataset.append_int_column("int", range(4))
dataset.append_string_column("str", "foo")
dataset.append_datetime_column("dt", datetime(2017, 1, 1, 12))
dataset.append_categorical_column("cat", ["foo", "bar"] * 2)
spotlight.show("docs/example.h5")
To show an updated dataset change some viewer settings (e.g. provide custom
types), you can reuse the viewer instance returned by spotlight.show
.
df = pd.read_csv("https://renumics.com/data/mnist/mnist-tiny.csv")
viewer = spotlight.show(df)
df["str"] = "foo"
viewer.show(df)
viewer.show(dtype={"image": dtypes.image_dtype})
Functions​
show(dataset=None, folder=None, host='127.0.0.1', port='auto', layout=None, no_browser=False, allow_filebrowsing='auto', wait='auto', dtype=None, analyze=None, issues=None, embed=None)
​
Start a new Spotlight viewer.
Args
dataset
: Dataset file or pandas.DataFrame (df) to open.
folder
: Root folder for filebrowser and lookup of dataset files.
host
: optional host to run Spotlight at.
port
: optional port to run Spotlight at.
If "auto" (default), automatically choose a random free port.
layout
: Optional Spotlight layout.
no_browser
: do not show Spotlight in browser.
allow_filebrowsing
: Whether to allow users to browse and open datasets.
If "auto" (default), allow to browse if dataset_or_folder
is a path.
wait
: If True
, block code execution until all Spotlight browser tabs are closed.
If False
, continue code execution after Spotlight start.
If "forever", keep spotlight running forever, but block.
If "auto" (default), choose the mode automatically: non-blocking (False
) for
jupyter notebook
, ipython
and other interactive sessions;
blocking (True
) for scripts.
dtype
: Optional dict with mapping column name -> column type
with
column types allowed by Spotlight (for dataframes only).
analyze
: Automatically analyze common dataset issues (disabled by default).
issues
: Custom dataset issues displayed in the viewer.
embed
: Automatically embed all or given columns with default
embedders (disabled by default).
close(port='last')
​
Close an active Spotlight viewer.
Args
port
: optional port number at which the Spotlight viewer is running.
If "last" (default), close the last started Spotlight viewer.
Raises
ViewNotFoundError
: if no Spotlight viewer found at the given port
.
viewers()
​
Get all active Spotlight viewer instances.
clear_caches()
​
Clear all cached data.
Classes​
Viewer(host='127.0.0.1', port='auto')
​
A Spotlight viewer. It corresponds to a single running Spotlight instance.
Viewer can be created using spotlight.show
.
Attributes
host
: host at which Spotlight is running
port
: port at which Spotlight is running
Instance variables
df
​
Get served DataFrame
if a DataFrame
is served, None
otherwise.
host
​
The configured host setting.
port
​
The port the viewer is running on.
running
​
True if the viewer's webserver is running, false otherwise.
url
​
The viewer's url.
Methods
close(self, wait=False)
​
Shutdown the corresponding Spotlight instance.
open_browser(self)
​
Open the corresponding Spotlight instance in a browser.
refresh(self)
​
Refresh the corresponding Spotlight instance in a browser.
show(self, dataset, folder=None, layout=None, no_browser=False, allow_filebrowsing='auto', wait='auto', dtype=None, analyze=None, issues=None, embed=None)
​
Show a dataset or folder in this spotlight viewer.
Args
dataset
: Dataset file or pandas.DataFrame (df) to open.
folder
: Root folder for filebrowser and lookup of dataset files.
layout
: Optional Spotlight layout.
no_browser
: Do not show Spotlight in browser.
allow_filebrowsing
: Whether to allow users to browse and open datasets.
If "auto" (default), allow to browse if dataset_or_folder
is a path.
wait
: If True
, block code execution until all Spotlight browser tabs are closed.
If False
, continue code execution after Spotlight start.
If "forever", keep spotlight running forever, but block.
If "auto" (default), choose the mode automatically: non-blocking (False
) for
jupyter notebook
, ipython
and other interactive sessions;
blocking (True
) for scripts.
dtype
: Optional dict with mapping column name -> column type
with
column types allowed by Spotlight (for dataframes only).
analyze
: Automatically analyze common dataset issues (disabled by default).
issues
: Custom dataset issues displayed in the viewer.
embed
: Automatically embed all or given columns with default
embedders (disabled by default).
DataIssue(*args, **kwargs)
​
An Issue affecting multiple rows of the dataset