All posts
Read about our work and the latest news in the field of data-centric AI, industrial AI, LLMs, and more.
interactivedata visualizationdata explorationwine dataset
Interactive Data Insights Made Simple: Visualize with Just One Line of Code
Explore interactive data visualization with Spotlight. Dive into the wine dataset and uncover insights with our Open Source Tool.
Marius Steger
August 28, 2023
•
7 min read
data-centric aidata slicingcomputer vision
Finding data slices in unstructured data
Data slices are semantically meaningful subsets of the data, where the model performs anomalously. We discuss current challenges and demonstrate hands-on examples of opens source tooling.
Stefan Suwelack
August 12, 2023
•
9 min read
computer visiondata-visualizationdata-centric ai
Changes of Embeddings during Fine-Tuning of Vision Transformers (ViT)
Fine-tuning significantly influences embeddings in image classification. Pre-fine-tuning embeddings offer general-purpose representations, whereas post-fine-tuning embeddings capture task-specific features. This distinction can lead to varying outcomes in outlier detection and other tasks. Both pre-fine-tuning and post-fine-tuning embeddings have their unique strengths and should be used in combination to achieve a comprehensive analysis in image classification and analysis tasks.
Markus Stoll
July 27, 2023
•
10 min read
visualizationdata curationsynthetic data generationproductiondata-centric AIpythonclustering
Interactive Data Exploration with Spotlight: Unveiling Critical Segments to Guide Synthetic Data Generation
Building robust models for Visual Inspection in production settings can be a real challenge. Here, cloud services like Amazon Lookout for Vision promise relief for model training but have limitations regarding data curation. This article explores those potential shortcomings and shows how to improve over them to leverage these services to the fullest.
Marius Steger
July 13, 2023
•
10 min read
Data-centric AIdata curationproject managementuse case assessment
The Industrial AI Canvas
The Industrial AI Canvas can be a useful tool for planning data and ml-based projects.
Daniel Klitzke
April 14, 2023
•
4 min read
data-centric AIdata curationanomaly detectionacoustics
Enriched dataset for anomalous sound event detection
If you work in ML-based acoustics, the annual DCASE challenge is a great resource to learn about new state-of-the-art methods. We built an enriched dataset for the condition monitoring task that can be downloaded from Huggingface and explored with Spotlight in just five minutes.
Stefan Suwelack
March 20, 2023
•
4 min read
data-centric AIdata curation
Why we are building Spotlight
We have just released the open version of our data curation software Renumics Spotlight. It is intended for cross-functional teams who want to be in control of their data and data curation processes. In this post I would like to share our ideas behind this product.
Stefan Suwelack
February 6, 2023
•
5 min read
test dataautomationacoustic event detection
Machine learning for test data analysis: Brake squeal example
Machine learning can drastically speed up the analysis of engineering test data. We use the AI-assisted Engineering Canvas to conceptualize a use case from brake squeal analysis.
Stefan Suwelack
August 8, 2022
•
7 min read
condition monitoringdata curationaudiodata explorationEDA
Data curation checklist for condition monitoring (Part 2)
Data collection for condition monitoring has several pitfalls, potentially leading to data that is not suitable for training robust machine learning models. The data problems resulting from the data collection include but are not limited to the presence of failures in the recording equipment, the dominance of specific operating conditions, or mislabeled audio samples. In this article, we will thus help you to ask the right questions and equip you with a checklist you can use when collecting and preparing data for your condition monitoring use case.
Daniel Klitzke
June 23, 2022
•
5 min read
condition monitoringdata curationaudiodata explorationEDA
Data curation checklist for condition monitoring (Part 1)
Data collection for condition monitoring has several pitfalls, potentially leading to data that is not suitable for training robust machine learning models. The data problems resulting from the data collection include but are not limited to the presence of failures in the recording equipment, the dominance of specific operating conditions, or mislabeled audio samples. In this article, we will thus help you to ask the right questions and equip you with a checklist you can use when collecting and preparing data for your condition monitoring use case.
Daniel Klitzke
May 24, 2022
•
6 min read