Version: 1.0.0

Find false labels with Cleanlab

We use the Cleanlab library to compute label error scores. We then manually inspect the data points to correct them.

Use Chrome to run Spotlight in Colab. Due to Colab restrictions (e.g. no websocket support), the performance is limited. Run the notebook locally for the full Spotlight experience.

inputs
outputs
parameters

df['label'] contains the label for each data sample
df['probabilities'] contains the class probability vector that was inferred by the model

df_leak['label_error_score'] contains a boolean flag that indicates a data sample with a label error

Spotlight_screenshot_label_errors

Imports and play as copy-n-paste functions

# Install dependencies

#@title Install required packages with PIP

!pip install renumics-spotlight cleanlab datasets

# Play as copy-n-paste functions

#@title Play as copy-n-paste functions

import datasets
from renumics import spotlight
from cleanlab.filter import find_label_issues
import numpy as np
import pandas as pd


def label_error_score_cleanlab(df, probabilities_name='probabilities', label_name='labels'):

    probs = np.stack(df[probabilities_name].to_numpy())
    labels = df[label_name].to_numpy()

    label_issues = find_label_issues(labels, probs)

    df_out=pd.DataFrame()
    df_out['label_error_score']=label_issues

    return df_out

Step-by-step example on CIFAR-100

Load CIFAR-100 from Huggingface hub and convert it to Pandas dataframe

dataset = datasets.load_dataset("renumics/cifar100-enriched", split="train")
df = dataset.to_pandas()

Compute label error scores with Cleanlab

df_le = label_error_score_cleanlab(df, label_name='fine_label')
df = pd.concat([df, df_le], axis=1)

Inspect label errors and remove them with Spotlight

df_show = df.drop(columns=['embedding', 'probabilities'])
layout_url = "https://raw.githubusercontent.com/Renumics/spotlight/playbook_initial_draft/playbook/rookie/label_errors_cleanlab.json"
response = requests.get(layout_url)
layout = spotlight.layout.nodes.Layout(**json.loads(response.text))
spotlight.show(df_show, dtype={"image": spotlight.Image, "embedding_reduced": spotlight.Embedding}, layout=layout)

Find false labels with Cleanlab

Imports and play as copy-n-paste functions​

Step-by-step example on CIFAR-100​

Load CIFAR-100 from Huggingface hub and convert it to Pandas dataframe​

Compute label error scores with Cleanlab​

Inspect label errors and remove them with Spotlight​

Imports and play as copy-n-paste functions

Step-by-step example on CIFAR-100

Load CIFAR-100 from Huggingface hub and convert it to Pandas dataframe

Compute label error scores with Cleanlab

Inspect label errors and remove them with Spotlight