iSafetyBench

Abstract

Recent advances in vision-language models (VLMs) have enabled impressive generalization across diverse video understanding tasks under zero-shot settings. However, their capabilities in high-stakes industrial domains—where recognizing both routine operations and safety-critical anomalies is essential—remain largely underexplored. To address this gap, we introduce iSafetyBench, a new video-language benchmark specifically designed to evaluate model performance in industrial environments across both normal and hazardous scenarios. iSafetyBench comprises 1,100 video clips sourced from real-world industrial settings, annotated with open-vocabulary, multi-label action tags spanning 98 routine and 67 hazardous action categories. Each clip is paired with multiple-choice questions for both single-label and multi-label evaluation, enabling fine-grained assessment of VLMs in both standard and safety-critical contexts. We evaluate eight state-of-the-art video-language models under zero-shot conditions. Despite their strong performance on existing video benchmarks, these models struggle with iSafetyBench—particularly in recognizing hazardous activities and in multi-label scenarios. Our results reveal significant performance gaps, underscoring the need for more robust, safety-aware multimodal models for industrial applications. iSafetyBench provides a first-of-its-kind testbed to drive progress in this direction.

Comparison with existing datasets

Dataset	Normal scenarios	Dangerous scenarios	Multi-label	Textual data	Environment type(s)	Set type	# Normal actions	# Non-critical anomaly actions	# Danger/hazard actions	# High-level categories
UCF-Crime	✔	✔	✖	✖	Multiple	Closed	0	0	13	0
InHARD	✔	✖	✖	✖	Single	Closed	74	0	0	14
TIMo	✔	✖	✖	✖	Single	Closed	35	21	0	20
OpenPack	✔	✔	✖	✖	Single	Closed	43	44	1	17
Safe/Unsafe Behaviours	✔	✔	✖	✖	Single	Closed	4	0	4	2
Construction Meta Action	✔	✔	✖	✖	Single	Closed	1	0	6	0
iSafetyBench (Ours)	✔	✔	✔	✔	Multiple	Open	98	0	67	18

BibTeX

@InProceedings{Abdullah_2025_ICCV, author = {Abdullah, Raiyaan and Rawat, Yogesh Singh and Vyas, Shruti}, title = {iSafetyBench: A video-language benchmark for safety in industrial environment}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {1433-1442} }

iSafetyBench: A video-language benchmark for safety in industrial environment

Normal scenarios

Dangerous/Hazard scenarios

Performance comparison of models

Abstract

Comparison with existing datasets

BibTeX