Create image embeddings with Towhee
We use the towhee library to create an embedding for a an image dataset.
Use Chrome to run Spotlight in Colab. Due to Colab restrictions (e.g. no websocket support), the performance is limited. Run the notebook locally for the full Spotlight experience.
- inputs
- outputs
- parameters
df['image']
contains the paths to the images in the dataset
df['embeddings']
contain the image embeddings for each data sample
modelname
designates the pre-trained model used to compute the embedding. You can find many more available models on the towhee operator hub.
Imports and play as copy-n-paste functions
# Install dependencies
# Imports
!pip install renumics-spotlight towhee datasets
#Play as copy-n-paste functions
#@title Play as copy-n-paste functions
import datasets
from towhee import pipeline, DataCollection
from renumics import spotlight
import pandas as pd
def towhee_embedding(df, modelname='towhee/image-embedding-swin-base-patch4-window7-224', image_name='image'):
dc = DataCollection(df[image_name])
embedding_pipeline = pipeline(modelname)
dc_embedding = dc.map(embedding_pipeline)
df_emb = pd.DataFrame()
df_emb['embedding']=dc_embedding.to_list()
return df_emb
Step-by-step example on CIFAR-100
Load CIFAR-100 from Huggingface hub and convert it to Pandas dataframe
dataset = datasets.load_dataset("renumics/cifar100-enriched", split="train")
df = dataset.to_pandas()
Compute embedding with vision transformer from Huggingface
df_emb=towhee_embedding(df, modelname='towhee/image-embedding-swin-base-patch4-window7-224')
df = pd.concat([df, df_emb], axis=1)
Reduce embeddings for faster visualization
import umap
import numpy as np
embeddings = np.stack(df['embedding'].to_numpy())
reducer = umap.UMAP()
reduced_embedding = reducer.fit_transform(embeddings)
df['embedding_reduced'] = np.array(reduced_embedding).tolist()
Perform EDA with Spotlight
df_show = df.drop(columns=['embedding', 'probabilities'])
spotlight.show(df_show, port=port, dtype={"image": spotlight.Image, "embedding_reduced": spotlight.Embedding})